Skip to content

Commit

Permalink
docs: Add initial density heatmap docs (#626)
Browse files Browse the repository at this point in the history
Fixes #608 
#598 should be merged first

Adds some initial density heatmap docs
  • Loading branch information
jnumainville authored Jul 25, 2024
1 parent b5c51ad commit 2dfbe0f
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 5 deletions.
9 changes: 9 additions & 0 deletions plugins/plotly-express/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,15 @@ This page contains a collection of links to examples demonstrating different plo

</CardList>

### 2D Distribution Plots

<CardList>

[![Density Heatmap - Displays the distribution of continuous variables using a grid](_assets/plot_icons/density_heatmap.svg)](density_heatmap.md)

</CardList>


### Financial Plots

<CardList>
Expand Down
90 changes: 90 additions & 0 deletions plugins/plotly-express/docs/density_heatmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Density Heatmap Plot

A density heatmap plot is a data visualization that uses a colored grid to represent a count over two columns or more (generally an aggregation over three columns). The grid is divided into cells colored based on the aggregated value of the data points that fall within each cell. Passing in one independent variable and one dependent variable provides an approximating replacement for a scatter plot when there are too many data points to be easily visualized. Providing two independent variables and a third dependent variable allows for a more general aggregation to assess a specific metric of the data distribution. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis.

#### When are density heatmap plots appropriate?

Density heatmap plots are appropriate when the data contains two continuous variables of interest and optionally a third dependent variable.

#### What are density heatmap plots useful for?

- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a scatter plot.
- **2D Density Estimation**: Density heatmaps can serve as the basis for 2D density estimation methods, helping to model and understand underlying data distributions, which is crucial in statistical analysis and machine learning.
- **Metric Assessment**: By aggregating data points within each cell, density heatmaps can provide insights into the distribution of a specific metric or value across different regions, highlighting groups for further analysis.

## Examples

### A basic density heatmap

Visualize the counts of data points between two continuous variables within a grid. This could possibly replace a scatter plot when there are too many data points to be easily visualized.

```python order=heatmap,iris
import deephaven.plot.express as dx
iris = dx.data.iris()

# Create a basic density heatmap by specifying columns for the `x` and `y` axes
heatmap = dx.density_heatmap(iris, x="petal_length", y="petal_width")
```

### A density heatmap with a custom color scale

Visualize the counts of data points between two continuous variables within a grid with a custom color scale.

```py order=heatmap_colorscale,iris
import deephaven.plot.express as dx
iris = dx.data.iris() # Import a ticking version of the Iris dataset

# Color the heatmap using the "viridis" color scale with a range from 5 to 8
heatmap_colorscale = dx.density_heatmap(
iris,
x="petal_length",
y="petal_width",
color_continuous_scale="viridis",
range_color=[5, 8]
)
```

### A density heatmap with a custom grid size and range

Visualize the counts of data points between two continuous variables within a grid with a custom grid size and range. The number of bins significantly impacts the visualization by changing the granularity of the grid.

```py order=heatmap_bins,iris
import deephaven.plot.express as dx
iris = dx.data.iris() # import a ticking version of the Iris dataset

# Create a density heatmap with 20 bins on each axis and a range from 3 to the maximum value for the x-axis.
# None is used to specify an upper bound of the maximum value.
heatmap_bins = dx.density_heatmap(
iris,
x="petal_length",
y="petal_width",
nbinsx=20,
nbinsy=20,
range_bins_x=[3, None],
)
```

### A density heatmap with a custom aggregation function

Visualize the average of a third dependent continuous variable across the grid. Histfuncs can only be used when three columns are provided. Possible histfuncs are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`.

```py order=heatmap_aggregation,iris

import deephaven.plot.express as dx
iris = dx.data.iris() # import a ticking version of the Iris dataset

# Create a density heatmap with an average aggregation function.
heatmap_aggregation = dx.density_heatmap(
iris,
x="petal_length",
y="petal_width",
z="sepal_length",
histfunc="avg"
)
```


## API Reference
```{eval-rst}
.. dhautofunction:: deephaven.plot.express.density_heatmap
```
9 changes: 4 additions & 5 deletions plugins/plotly-express/docs/scatter.md
Original file line number Diff line number Diff line change
Expand Up @@ -424,20 +424,19 @@ In situations where scatter plots become impractical due to overlaping markers i

For large, but managable datasets, setting an appropriate opacity can be beneficial as it helps address data overlap issuess, making the individual data points more distinguishable and enhancing overall visualization clarity.

<!-- TODO: link to density heatmap -->
[Density Heatmap](density_heatmap.md)

```python order=density_heatmap,scatter_plot_opacity
import deephaven.plot.express as dx
my_table = dx.data.iris() # import the example iris data set

# TODO: Method doesn't exist yet
# Consider a 2d Histograms for large data sets
density_heatmap = dx.density_heatmap(my_table, x="sepal_width", y="sepal_length")
# Consider a density heatmap for large data sets
heatmap_replacement = dx.density_heatmap(my_table, x="sepal_width", y="sepal_length")

scatter_plot_opacity = dx.scatter(
my_table,
x="sepal_width",
y="sepal_length"
y="sepal_length",
# For data sets with a high degree of overlap between points, consider setting opacity
opacity=0.5
)
Expand Down

0 comments on commit 2dfbe0f

Please sign in to comment.