deephaven · alexpeters1208 · Jul 29, 2024 · Jun 11, 2024 · Jun 12, 2024 · Jun 13, 2024
diff --git a/plugins/plotly-express/docs/area.md b/plugins/plotly-express/docs/area.md
@@ -8,38 +8,37 @@ Area plots are appropriate when the data contain a continuous response variable
 
 - **Visualizing trends over time**: Area plots are great for displaying the trend of a single continuous variable. The filled areas can make it easier to see the magnitude of changes and trends compared to line plots.
 - **Displaying cumulative totals**: Area plots are effective in showing cumulative totals over a period. They can help in understanding the contribution of different categories to the total amount and how these contributions evolve.
-- **Comparing multiple categories**: Rather than providing a single snapshot of the composition of a total, area plots show how contributions from each category change over time. The different colored or shaded areas help distinguish each category, making it easier to see their individual contributions and to compare how those categories evolve.
+- **Comparing multiple categories**: Rather than providing a single snapshot of the composition of a total, area plots show how contributions from each category change over time.
 
 ## Examples
 
 ### A basic area plot
 
-Visualize the relationship between two variables. In this case, an area plot is similar to a line plot.
+Visualize the relationship between two variables by passing each column name to the `x` and `y` arguments.
 
 ```python order=area_plot,usa_population
 import deephaven.plot.express as dx
-gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset
+gapminder = dx.data.gapminder()
 
 # subset to get a specific group
-usa_population = gapminder.where("country == `United States`")
+usa_population = gapminder.where("Country == `United States`")
 
-# create a basic area plot by specifying columns for the `x` and `y` axes
-area_plot = dx.area(usa_population, x="year", y="pop")
+area_plot = dx.area(usa_population, x="Year", y="Pop")
 ```
 
-### Color by group
+### Area by group
 
-Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Use the `by` argument to specify a grouping column.
+Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Pass the name of the grouping column(s) to the `by` argument.
 
-```python order=area_plot_multi,large_countries_population
+```python order=area_plot_group,large_countries_population
 import deephaven.plot.express as dx
-gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset
+gapminder = dx.data.gapminder()
 
-# subset to get a few categories to compare
-large_countries_population = gapminder.where("country in `United States`, `India`, `China`")
+# subset to get several countries to compare
+large_countries_population = gapminder.where("Country in `United States`, `India`, `China`")
 
-# the `by` uses unique values in the supplied column to color the plot according to those column values
-area_plot_multi = dx.area(large_countries_population, x="year", y="pop", by="country")
+# cumulative trend showing contribution from each group
+area_plot_group = dx.area(large_countries_population, x="Year", y="Pop", by="Country")
 ```
 
 ## API Reference

diff --git a/plugins/plotly-express/docs/bar.md b/plugins/plotly-express/docs/bar.md
@@ -1,6 +1,6 @@
 # Bar Plot
 
-A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups, making it easy to compare and visualize the distribution of data.
+A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups. Bar plots aggregate the response variable across the entire dataset for each category, so that the y-axis represents the sum of the response variable per category.
 
 Bar plots are appropriate when the data contain a continuous response variable that is directly related to a categorical explanatory variable. Additionally, if the response variable is a cumulative total of contributions from different subcategories, each bar can be broken up to demonstrate those contributions.
 
@@ -14,29 +14,40 @@ Bar plots are appropriate when the data contain a continuous response variable t
 
 ### A basic bar plot
 
-Visualize the relationship between a continuous variable and a categorical or discrete variable. By default, the y-axis shows the cumulative value for each group over the whole dataset.
+Visualize the relationship between a continuous variable and a categorical or discrete variable by passing the column names to the `x` and `y` arguments.
 
 ```python order=bar_plot,tips
 import deephaven.plot.express as dx
-tips = dx.data.tips() # import a ticking version of the Tips dataset
+tips = dx.data.tips()
 
-# create a basic bar plot by specifying columns for the `x` and `y` axes
-bar_plot = dx.bar(tips, x="day", y="total_bill")
+bar_plot = dx.bar(tips, x="Day", y="TotalBill")
+```
+
+Change the x-axis ordering by sorting the dataset by the categorical variable.
+
+```python order=ordered_bar_plot,tips
+import deephaven.plot.express as dx
+tips = dx.data.tips()
+
+# sort the dataset to get a specific x-axis ordering, sort() acts alphabetically
+ordered_bar_plot = dx.bar(tips.sort("Day"), x="Day", y="TotalBill")
 ```
 
 ### Partition bars by group
 
-Use the `by` argument to break each bar up into contributions from the given group.
+Break bars down by group by passing the name of the grouping column(s) to the `by` argument.
 
 ```python order=bar_plot_smoke,bar_plot_sex,tips
 import deephaven.plot.express as dx
-tips = dx.data.tips() # import a ticking version of the Tips dataset
+tips = dx.data.tips()
+
+sorted_tips = tips.sort("Day")
 
-# Ex 1. Partition bars by smoker / non-smoker
-bar_plot_smoke = dx.bar(tips, x="day", y="total_bill", by="smoker")
+# group by smoker / non-smoker
+bar_plot_smoke = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Smoker")
 
-# Ex 2. Partition bars by male / female
-bar_plot_sex = dx.bar(tips, x="day", y="total_bill", by="sex")
+# group by male / female
+bar_plot_sex = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Sex")
 ```
 
 ## API Reference

diff --git a/plugins/plotly-express/docs/box.md b/plugins/plotly-express/docs/box.md
@@ -1,42 +1,43 @@
 # Box Plot
 
-A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data.
+A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data. To learn more about the mathematics involved in creating box plots, check out [this article](https://asq.org/quality-resources/box-whisker-plot).
 
-Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate.
+Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate using the `by` argument.
 
 ### What are box plots useful for?
 
 - **Visualizing overall distribution**: Box plots reveal the distribution of the variable of interest. They are good first-line tools for assessing whether a variable's distribution is symmetric, right-skewed, or left-skewed.
 - **Assessing center and spread**: A box plot displays the center (median) of a dataset using the middle line, and displays the spread (IQR) using the width of the box.
-- **Identifying potential outliers**: The dots displayed outside of the fenceposts in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution.
+- **Identifying potential outliers**: The dots displayed in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution.
 
 ## Examples
 
 ### A basic box plot
 
-Visualize the distribution of a single continuous variable using a box plot. Singular points lying outside the "fences" are candidates for being outliers.
+Visualize the distribution of a single variable by passing the column name to `x` or `y`.
 
-```python order=total_bill_plot,tips
+```python order=box_plot_x,box_plot_y,tips
 import deephaven.plot.express as dx
-tips = dx.data.tips() # import a ticking version of the Tips dataset
+tips = dx.data.tips()
 
-# create a basic box plot by specifying the variable of interest with `y`
-total_bill_plot = dx.box(tips, y="total_bill")
+# control the plot orientation using `x` or `y`
+box_plot_x = dx.box(tips, x="TotalBill")
+box_plot_y = dx.box(tips, y="TotalBill")
 ```
 
 ### Distributions for multiple groups
 
-Box plots are useful for comparing the distributions of two or more groups of data. Use the `by` argument to specify a grouping column.
+Box plots are useful for comparing the distributions of two or more groups of data. Pass the name of the grouping column(s) to the `by` argument.
 
-```python order=total_bill_smoke,total_bill_sex,tips
+```python order=box_plot_group_1,box_plot_group_2,tips
 import deephaven.plot.express as dx
-tips = dx.data.tips() # import a ticking version of the Tips dataset
+tips = dx.data.tips()
 
-# Ex 1. Total bill distribution by smoker / non-smoker
-total_bill_smoke = dx.box(tips, y="total_bill", by="smoker")
+# total bill distribution by Smoker / non-Smoker
+box_plot_group_1 = dx.box(tips, y="TotalBill", by="Smoker")
 
-# Ex 2. Total bill distribution by male / female
-total_bill_sex = dx.box(tips, y="total_bill", by="sex")
+# total bill distribution by male / female
+box_plot_group_2 = dx.box(tips, y="TotalBill", by="Sex")
 ```
 
 ## API Reference

diff --git a/plugins/plotly-express/docs/candlestick.md b/plugins/plotly-express/docs/candlestick.md
@@ -8,42 +8,41 @@ In a bullish (upward, typically shown as green) candlestick, the open is typical
 
 ### What are candlestick plots useful for?
 
-- **Analyzing financial markets**: Candlestick plots are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial markets, such as stocks, forex, and cryptocurrencies.
+- **Analyzing financial markets**: Candlestick plots are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial instruments, such as stocks, forex, and cryptocurrencies.
 - **Short to medium-term trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame.
 - **Visualizing variation in price data**: Candlestick plots offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis.
 
 ## Examples
 
 ### A basic candlestick plot
 
-Visualize the key summary statistics of a single continuous variable as it evolves.
+Visualize the key summary statistics of a stock price as it evolves. Specify the column name of the instrument with `x`, and pass the `open`, `high`, `low`, and `close` arguments the appropriate column names.
 
 ```python order=candlestick_plot,stocks_1min_ohlc,stocks
 import deephaven.plot.express as dx
 import deephaven.agg as agg
-stocks = dx.data.stocks()  # import the example stock market data set
+stocks = dx.data.stocks()
 
 # compute ohlc per symbol for each minute
 stocks_1min_ohlc = stocks.update_view(
-    "binnedTimestamp = lowerBin(timestamp, 'PT1m')"
+    "BinnedTimestamp = lowerBin(Timestamp, 'PT1m')"
 ).agg_by(
     [
-        agg.first("open=price"),
-        agg.max_("high=price"),
-        agg.min_("low=price"),
-        agg.last("close=price"),
+        agg.first("Open=Price"),
+        agg.max_("High=Price"),
+        agg.min_("Low=Price"),
+        agg.last("Close=Price"),
     ],
-    by=["sym", "binnedTimestamp"],
+    by=["Sym", "BinnedTimestamp"],
 )
 
-# create a basic candlestick plot - the `open`, `high`, `low`, and `close` arguments must be specified
 candlestick_plot = dx.candlestick(
-    stocks_1min_ohlc.where("sym == `DOG`"),
-    x="binnedTimestamp",
-    open="open",
-    high="high",
-    low="low",
-    close="close",
+    stocks_1min_ohlc.where("Sym == `DOG`"),
+    x="BinnedTimestamp",
+    open="Open",
+    high="High",
+    low="Low",
+    close="Close",
 )
 ```
 

diff --git a/plugins/plotly-express/docs/density_heatmap.md b/plugins/plotly-express/docs/density_heatmap.md
@@ -1,63 +1,59 @@
 # Density Heatmap Plot
 
-A density heatmap plot is a data visualization that uses a colored grid to represent a count over two columns or more (generally an aggregation over three columns). The grid is divided into cells colored based on the aggregated value of the data points that fall within each cell. Passing in one independent variable and one dependent variable provides an approximating replacement for a scatter plot when there are too many data points to be easily visualized. Providing two independent variables and a third dependent variable allows for a more general aggregation to assess a specific metric of the data distribution. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis.
+A density heatmap plot is a data visualization that uses a colored grid to represent the joint distribution of a pair of continuous variables. More generally, density heatmaps can be used to visualize any statistical aggregation over a pair of continuous variables. The pair of continuous variables may be explanatory and response variables. In this case, a density heatmap provides an approximation to a scatter plot when there are too many data points to be easily visualized. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis, yielding 100 bins in total.
 
-#### When are density heatmap plots appropriate? 
-
-Density heatmap plots are appropriate when the data contains two continuous variables of interest and optionally a third dependent variable.
+Density heatmaps are appropriate when the data contain two continuous variables of interest. An additional quantitative variable may be incorporated into the visualization using shapes or colors.
 
 #### What are density heatmap plots useful for? 
 
-- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a scatter plot.
+- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a [scatter plot](scatter.md).
 - **2D Density Estimation**: Density heatmaps can serve as the basis for 2D density estimation methods, helping to model and understand underlying data distributions, which is crucial in statistical analysis and machine learning.
 - **Metric Assessment**: By aggregating data points within each cell, density heatmaps can provide insights into the distribution of a specific metric or value across different regions, highlighting groups for further analysis.
 
 ## Examples
 
 ### A basic density heatmap
 
-Visualize the counts of data points between two continuous variables within a grid. This could possibly replace a scatter plot when there are too many data points to be easily visualized.
+Visualize the joint distribution of two variables by passing each column name to the `x` and `y` arguments.
 
 ```python order=heatmap,iris
 import deephaven.plot.express as dx
 iris = dx.data.iris()
 
-# Create a basic density heatmap by specifying columns for the `x` and `y` axes
-heatmap = dx.density_heatmap(iris, x="petal_length", y="petal_width")
+heatmap = dx.density_heatmap(iris, x="PetalLength", y="PetalWidth")
 ```
 
 ### A density heatmap with a custom color scale
 
-Visualize the counts of data points between two continuous variables within a grid with a custom color scale.
+Custom color scales can be provided to the `color_continuous_scale` argument, and their range can be defined with the `range_color` argument.
 
 ```py order=heatmap_colorscale,iris
 import deephaven.plot.express as dx
-iris = dx.data.iris() # Import a ticking version of the Iris dataset
+iris = dx.data.iris()
 
-# Color the heatmap using the "viridis" color scale with a range from 5 to 8
-heatmap_colorscale = dx.density_heatmap(
-    iris,
-    x="petal_length", 
-    y="petal_width", 
+# use the "viridis" color scale with a range from 5 to 8
+heatmap_colorscale = dx.density_heatmap(iris,
+    x="PetalLength", 
+    y="PetalWidth", 
     color_continuous_scale="viridis", 
     range_color=[5, 8]
 )
 ```
 
 ### A density heatmap with a custom grid size and range
 
-Visualize the counts of data points between two continuous variables within a grid with a custom grid size and range. The number of bins significantly impacts the visualization by changing the granularity of the grid.
+The number of bins on each axis can be set using the `nbinsx` and `nbinsy` arguments. The number of bins significantly impacts the visualization by changing the granularity of the grid.
 
 ```py order=heatmap_bins,iris
 import deephaven.plot.express as dx
-iris = dx.data.iris() # import a ticking version of the Iris dataset
+iris = dx.data.iris()
 
 # Create a density heatmap with 20 bins on each axis and a range from 3 to the maximum value for the x-axis. 
 # None is used to specify an upper bound of the maximum value.
 heatmap_bins = dx.density_heatmap(
     iris, 
-    x="petal_length", 
-    y="petal_width", 
+    x="PetalLength", 
+    y="PetalWidth", 
     nbinsx=20,
     nbinsy=20,
     range_bins_x=[3, None],  
@@ -66,23 +62,48 @@ heatmap_bins = dx.density_heatmap(
 
 ### A density heatmap with a custom aggregation function
 
-Visualize the average of a third dependent continuous variable across the grid. Histfuncs can only be used when three columns are provided. Possible histfuncs are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`.
+Use an additional continuous variable to color the heatmap. Many statistical aggregations can be computed on this column by providing the `histfunc` argument. Possible values for the `histfunc` are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`.
 
 ```py order=heatmap_aggregation,iris
-
 import deephaven.plot.express as dx
-iris = dx.data.iris() # import a ticking version of the Iris dataset
+iris = dx.data.iris()
 
-# Create a density heatmap with an average aggregation function.
-heatmap_aggregation = dx.density_heatmap(
-    iris, 
-    x="petal_length", 
-    y="petal_width", 
-    z="sepal_length", 
+# color the map by the average of an additional continuous variable
+heatmap_aggregation = dx.density_heatmap(iris, 
+    x="PetalLength", 
+    y="PetalWidth", 
+    z="SepalLength", 
     histfunc="avg"
 )
 ```
 
+### Large datasets
+
+Visualize the joint distribution of a large dataset (10 million rows in this example) by passing each column name to the `x` and `y` arguments. Increasing the number of bins can produce a much smoother visualization.
+
+```python order=large_heatmap_2,large_heatmap_1,large_data
+from deephaven.plot import express as dx
+from deephaven import empty_table
+
+large_data = empty_table(10_000_000).update([
+    "X = 50 + 25 * cos(i * Math.PI / 180)",
+    "Y = 50 + 25 * sin(i * Math.PI / 180)",
+])
+
+# specify range to see entire plot
+large_heatmap_1 = dx.density_heatmap(large_data, x="X", y="Y", range_bins_x=[0,100], range_bins_y=[0,100])
+
+# using bins may be useful for more precise visualizations
+large_heatmap_2 = dx.density_heatmap(
+    large_data,
+    x="X",
+    y="Y",
+    range_bins_x=[0,100],
+    range_bins_y=[0,100],
+    nbinsx=100, 
+    nbinsy=100
+)
+```
 
 ## API Reference
 ```{eval-rst}