docs: Add initial density heatmap docs (#626)

Fixes #608 #598 should be merged first Adds some initial density heatmap docs
deephaven · Jul 25, 2024 · 2dfbe0f · 2dfbe0f
1 parent b5c51ad
commit 2dfbe0f
Show file tree

Hide file tree

Showing 3 changed files with 103 additions and 5 deletions.
diff --git a/plugins/plotly-express/docs/README.md b/plugins/plotly-express/docs/README.md
@@ -39,6 +39,15 @@ This page contains a collection of links to examples demonstrating different plo
 
 </CardList>
 
+### 2D Distribution Plots
+
+<CardList>
+
+[![Density Heatmap - Displays the distribution of continuous variables using a grid](_assets/plot_icons/density_heatmap.svg)](density_heatmap.md)
+
+</CardList>
+
+
 ### Financial Plots
 
 <CardList>

diff --git a/plugins/plotly-express/docs/density_heatmap.md b/plugins/plotly-express/docs/density_heatmap.md
@@ -0,0 +1,90 @@
+# Density Heatmap Plot
+
+A density heatmap plot is a data visualization that uses a colored grid to represent a count over two columns or more (generally an aggregation over three columns). The grid is divided into cells colored based on the aggregated value of the data points that fall within each cell. Passing in one independent variable and one dependent variable provides an approximating replacement for a scatter plot when there are too many data points to be easily visualized. Providing two independent variables and a third dependent variable allows for a more general aggregation to assess a specific metric of the data distribution. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis.
+
+#### When are density heatmap plots appropriate? 
+
+Density heatmap plots are appropriate when the data contains two continuous variables of interest and optionally a third dependent variable.
+
+#### What are density heatmap plots useful for? 
+
+- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a scatter plot.
+- **2D Density Estimation**: Density heatmaps can serve as the basis for 2D density estimation methods, helping to model and understand underlying data distributions, which is crucial in statistical analysis and machine learning.
+- **Metric Assessment**: By aggregating data points within each cell, density heatmaps can provide insights into the distribution of a specific metric or value across different regions, highlighting groups for further analysis.
+
+## Examples
+
+### A basic density heatmap
+
+Visualize the counts of data points between two continuous variables within a grid. This could possibly replace a scatter plot when there are too many data points to be easily visualized.
+
+```python order=heatmap,iris
+import deephaven.plot.express as dx
+iris = dx.data.iris()
+
+# Create a basic density heatmap by specifying columns for the `x` and `y` axes
+heatmap = dx.density_heatmap(iris, x="petal_length", y="petal_width")
+```
+
+### A density heatmap with a custom color scale
+
+Visualize the counts of data points between two continuous variables within a grid with a custom color scale.
+
+```py order=heatmap_colorscale,iris
+import deephaven.plot.express as dx
+iris = dx.data.iris() # Import a ticking version of the Iris dataset
+
+# Color the heatmap using the "viridis" color scale with a range from 5 to 8
+heatmap_colorscale = dx.density_heatmap(
+    iris,
+    x="petal_length", 
+    y="petal_width", 
+    color_continuous_scale="viridis", 
+    range_color=[5, 8]
+)
+```
+
+### A density heatmap with a custom grid size and range
+
+Visualize the counts of data points between two continuous variables within a grid with a custom grid size and range. The number of bins significantly impacts the visualization by changing the granularity of the grid.
+
+```py order=heatmap_bins,iris
+import deephaven.plot.express as dx
+iris = dx.data.iris() # import a ticking version of the Iris dataset
+
+# Create a density heatmap with 20 bins on each axis and a range from 3 to the maximum value for the x-axis. 
+# None is used to specify an upper bound of the maximum value.
+heatmap_bins = dx.density_heatmap(
+    iris, 
+    x="petal_length", 
+    y="petal_width", 
+    nbinsx=20,
+    nbinsy=20,
+    range_bins_x=[3, None],  
+)
+```
+
+### A density heatmap with a custom aggregation function
+
+Visualize the average of a third dependent continuous variable across the grid. Histfuncs can only be used when three columns are provided. Possible histfuncs are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`.
+
+```py order=heatmap_aggregation,iris
+
+import deephaven.plot.express as dx
+iris = dx.data.iris() # import a ticking version of the Iris dataset
+
+# Create a density heatmap with an average aggregation function.
+heatmap_aggregation = dx.density_heatmap(
+    iris, 
+    x="petal_length", 
+    y="petal_width", 
+    z="sepal_length", 
+    histfunc="avg"
+)
+```
+
+
+## API Reference
+```{eval-rst}
+.. dhautofunction:: deephaven.plot.express.density_heatmap
+```
diff --git a/plugins/plotly-express/docs/scatter.md b/plugins/plotly-express/docs/scatter.md
@@ -424,20 +424,19 @@ In situations where scatter plots become impractical due to overlaping markers i
 
 For large, but managable datasets, setting an appropriate opacity can be beneficial as it helps address data overlap issuess, making the individual data points more distinguishable and enhancing overall visualization clarity.
 
-<!-- TODO: link to density heatmap -->
+[Density Heatmap](density_heatmap.md)
 
 ```python order=density_heatmap,scatter_plot_opacity
 import deephaven.plot.express as dx
 my_table = dx.data.iris() # import the example iris data set
 
-# TODO: Method doesn't exist yet
-# Consider a 2d Histograms for large data sets
-density_heatmap = dx.density_heatmap(my_table, x="sepal_width", y="sepal_length")
+# Consider a density heatmap for large data sets
+heatmap_replacement = dx.density_heatmap(my_table, x="sepal_width", y="sepal_length")
 
 scatter_plot_opacity = dx.scatter(
     my_table,
     x="sepal_width",
-    y="sepal_length"
+    y="sepal_length",
     # For data sets with a high degree of overlap between points, consider setting opacity
     opacity=0.5
 )