Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: MVP plotly-express docs #554

Merged
merged 39 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d124531
Docs
alexpeters1208 Jun 11, 2024
af94f9f
Continue docs
alexpeters1208 Jun 12, 2024
6907460
More docs
alexpeters1208 Jun 13, 2024
8b56f55
Add polar and ternary examples
alexpeters1208 Jun 18, 2024
2fbf7b2
Add multiple-axes
alexpeters1208 Jun 20, 2024
03c067b
Start re-wording
alexpeters1208 Jun 20, 2024
275aa7d
Fix spacing
alexpeters1208 Jun 20, 2024
50ee789
Start "what are they useful for"
alexpeters1208 Jun 20, 2024
566fadc
Small language changes
alexpeters1208 Jun 20, 2024
6d15404
When are they appropriate
alexpeters1208 Jun 21, 2024
ad907ba
Update ecdf
alexpeters1208 Jun 21, 2024
1e21a4d
Merge branch 'main' into dx-min-docs
alexpeters1208 Jun 21, 2024
b615e3e
Update notes and warnings
alexpeters1208 Jun 21, 2024
1ad4acd
Simplify "what are they useful for"
alexpeters1208 Jun 21, 2024
d872a90
Don Area suggestion
alexpeters1208 Jun 24, 2024
d39b2b7
More review suggestions
alexpeters1208 Jun 28, 2024
643003a
Funnel plot
alexpeters1208 Jul 2, 2024
6eb71fb
Funnel, funnel area, timeline
alexpeters1208 Jul 3, 2024
9ec22f5
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 9, 2024
ab8ebf8
Start plot by
alexpeters1208 Jul 10, 2024
19dd7b3
More plot by
alexpeters1208 Jul 10, 2024
0cb05ce
Apply suggestions from code review
alexpeters1208 Jul 15, 2024
143249e
See how area plot renders
alexpeters1208 Jul 15, 2024
f9436bf
Check rendering
alexpeters1208 Jul 15, 2024
a47adfe
More tidying
alexpeters1208 Jul 16, 2024
d6638b6
More plot by
alexpeters1208 Jul 16, 2024
86c1b73
First round of revisions from Chip
alexpeters1208 Jul 24, 2024
4e4cc91
Scatter progress
alexpeters1208 Jul 25, 2024
6ef84bc
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 25, 2024
b4a186f
Revise scatter
alexpeters1208 Jul 25, 2024
ab8f2c4
More polish
alexpeters1208 Jul 25, 2024
778246e
More polish, Don review, add density heatmap
alexpeters1208 Jul 25, 2024
b5665a5
Pascal case
alexpeters1208 Jul 26, 2024
a562b64
Links
alexpeters1208 Jul 26, 2024
a432741
Revise concept pieces
alexpeters1208 Jul 26, 2024
c408bf2
Deterministic large datasets
alexpeters1208 Jul 26, 2024
f3c7ebe
Chip review
alexpeters1208 Jul 26, 2024
83fcb87
Chip and Joe suggestions
alexpeters1208 Jul 26, 2024
6dc0ed6
Move density heatmap up
alexpeters1208 Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 13 additions & 14 deletions plugins/plotly-express/docs/area.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,37 @@ Area plots are appropriate when the data contain a continuous response variable

- **Visualizing trends over time**: Area plots are great for displaying the trend of a single continuous variable. The filled areas can make it easier to see the magnitude of changes and trends compared to line plots.
- **Displaying cumulative totals**: Area plots are effective in showing cumulative totals over a period. They can help in understanding the contribution of different categories to the total amount and how these contributions evolve.
- **Comparing multiple categories**: Rather than providing a single snapshot of the composition of a total, area plots show how contributions from each category change over time. The different colored or shaded areas help distinguish each category, making it easier to see their individual contributions and to compare how those categories evolve.
- **Comparing multiple categories**: Rather than providing a single snapshot of the composition of a total, area plots show how contributions from each category change over time.

## Examples

### A basic area plot

Visualize the relationship between two variables. In this case, an area plot is similar to a line plot.
Visualize the relationship between two variables by passing each column name to the `x` and `y` arguments.

```python order=area_plot,usa_population
import deephaven.plot.express as dx
gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset
gapminder = dx.data.gapminder()

# subset to get a specific group
usa_population = gapminder.where("country == `United States`")
usa_population = gapminder.where("Country == `United States`")

# create a basic area plot by specifying columns for the `x` and `y` axes
area_plot = dx.area(usa_population, x="year", y="pop")
area_plot = dx.area(usa_population, x="Year", y="Pop")
```

### Color by group
### Area by group

Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Use the `by` argument to specify a grouping column.
Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Pass the name of the grouping column(s) to the `by` argument.

```python order=area_plot_multi,large_countries_population
```python order=area_plot_group,large_countries_population
import deephaven.plot.express as dx
gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset
gapminder = dx.data.gapminder()

# subset to get a few categories to compare
large_countries_population = gapminder.where("country in `United States`, `India`, `China`")
# subset to get several countries to compare
large_countries_population = gapminder.where("Country in `United States`, `India`, `China`")

# the `by` uses unique values in the supplied column to color the plot according to those column values
area_plot_multi = dx.area(large_countries_population, x="year", y="pop", by="country")
# cumulative trend showing contribution from each group
area_plot_group = dx.area(large_countries_population, x="Year", y="Pop", by="Country")
```

## API Reference
Expand Down
33 changes: 22 additions & 11 deletions plugins/plotly-express/docs/bar.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Bar Plot

A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups, making it easy to compare and visualize the distribution of data.
A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups. Bar plots aggregate the response variable across the entire dataset for each category, so that the y-axis represents the sum of the response variable per category.

Bar plots are appropriate when the data contain a continuous response variable that is directly related to a categorical explanatory variable. Additionally, if the response variable is a cumulative total of contributions from different subcategories, each bar can be broken up to demonstrate those contributions.

Expand All @@ -14,29 +14,40 @@ Bar plots are appropriate when the data contain a continuous response variable t

### A basic bar plot

Visualize the relationship between a continuous variable and a categorical or discrete variable. By default, the y-axis shows the cumulative value for each group over the whole dataset.
Visualize the relationship between a continuous variable and a categorical or discrete variable by passing the column names to the `x` and `y` arguments.

```python order=bar_plot,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset
tips = dx.data.tips()

# create a basic bar plot by specifying columns for the `x` and `y` axes
bar_plot = dx.bar(tips, x="day", y="total_bill")
bar_plot = dx.bar(tips, x="Day", y="TotalBill")
```

Change the x-axis ordering by sorting the dataset by the categorical variable.

```python order=ordered_bar_plot,tips
import deephaven.plot.express as dx
tips = dx.data.tips()

# sort the dataset to get a specific x-axis ordering, sort() acts alphabetically
ordered_bar_plot = dx.bar(tips.sort("Day"), x="Day", y="TotalBill")
```

### Partition bars by group

Use the `by` argument to break each bar up into contributions from the given group.
Break bars down by group by passing the name of the grouping column(s) to the `by` argument.

```python order=bar_plot_smoke,bar_plot_sex,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset
tips = dx.data.tips()

sorted_tips = tips.sort("Day")

# Ex 1. Partition bars by smoker / non-smoker
bar_plot_smoke = dx.bar(tips, x="day", y="total_bill", by="smoker")
# group by smoker / non-smoker
bar_plot_smoke = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Smoker")

# Ex 2. Partition bars by male / female
bar_plot_sex = dx.bar(tips, x="day", y="total_bill", by="sex")
# group by male / female
bar_plot_sex = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Sex")
```

## API Reference
Expand Down
31 changes: 16 additions & 15 deletions plugins/plotly-express/docs/box.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,43 @@
# Box Plot

A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data.
A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data. To learn more about the mathematics involved in creating box plots, check out [this article](https://asq.org/quality-resources/box-whisker-plot).

Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate.
Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate using the `by` argument.

### What are box plots useful for?

- **Visualizing overall distribution**: Box plots reveal the distribution of the variable of interest. They are good first-line tools for assessing whether a variable's distribution is symmetric, right-skewed, or left-skewed.
- **Assessing center and spread**: A box plot displays the center (median) of a dataset using the middle line, and displays the spread (IQR) using the width of the box.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
- **Identifying potential outliers**: The dots displayed outside of the fenceposts in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution.
- **Identifying potential outliers**: The dots displayed in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution.

## Examples

### A basic box plot

Visualize the distribution of a single continuous variable using a box plot. Singular points lying outside the "fences" are candidates for being outliers.
Visualize the distribution of a single variable by passing the column name to `x` or `y`.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

```python order=total_bill_plot,tips
```python order=box_plot_x,box_plot_y,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset
tips = dx.data.tips()

# create a basic box plot by specifying the variable of interest with `y`
total_bill_plot = dx.box(tips, y="total_bill")
# control the plot orientation using `x` or `y`
box_plot_x = dx.box(tips, x="TotalBill")
box_plot_y = dx.box(tips, y="TotalBill")
```

### Distributions for multiple groups

Box plots are useful for comparing the distributions of two or more groups of data. Use the `by` argument to specify a grouping column.
Box plots are useful for comparing the distributions of two or more groups of data. Pass the name of the grouping column(s) to the `by` argument.

```python order=total_bill_smoke,total_bill_sex,tips
```python order=box_plot_group_1,box_plot_group_2,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset
tips = dx.data.tips()

# Ex 1. Total bill distribution by smoker / non-smoker
total_bill_smoke = dx.box(tips, y="total_bill", by="smoker")
# total bill distribution by Smoker / non-Smoker
box_plot_group_1 = dx.box(tips, y="TotalBill", by="Smoker")

# Ex 2. Total bill distribution by male / female
total_bill_sex = dx.box(tips, y="total_bill", by="sex")
# total bill distribution by male / female
box_plot_group_2 = dx.box(tips, y="TotalBill", by="Sex")
```

## API Reference
Expand Down
31 changes: 15 additions & 16 deletions plugins/plotly-express/docs/candlestick.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,42 +8,41 @@ In a bullish (upward, typically shown as green) candlestick, the open is typical

### What are candlestick plots useful for?

- **Analyzing financial markets**: Candlestick plots are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial markets, such as stocks, forex, and cryptocurrencies.
- **Analyzing financial markets**: Candlestick plots are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial instruments, such as stocks, forex, and cryptocurrencies.
- **Short to medium-term trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame.
- **Visualizing variation in price data**: Candlestick plots offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis.

## Examples

### A basic candlestick plot

Visualize the key summary statistics of a single continuous variable as it evolves.
Visualize the key summary statistics of a stock price as it evolves. Specify the column name of the instrument with `x`, and pass the `open`, `high`, `low`, and `close` arguments the appropriate column names.

```python order=candlestick_plot,stocks_1min_ohlc,stocks
import deephaven.plot.express as dx
import deephaven.agg as agg
stocks = dx.data.stocks() # import the example stock market data set
stocks = dx.data.stocks()

# compute ohlc per symbol for each minute
stocks_1min_ohlc = stocks.update_view(
"binnedTimestamp = lowerBin(timestamp, 'PT1m')"
"BinnedTimestamp = lowerBin(Timestamp, 'PT1m')"
).agg_by(
[
agg.first("open=price"),
agg.max_("high=price"),
agg.min_("low=price"),
agg.last("close=price"),
agg.first("Open=Price"),
agg.max_("High=Price"),
agg.min_("Low=Price"),
agg.last("Close=Price"),
],
by=["sym", "binnedTimestamp"],
by=["Sym", "BinnedTimestamp"],
)

# create a basic candlestick plot - the `open`, `high`, `low`, and `close` arguments must be specified
candlestick_plot = dx.candlestick(
stocks_1min_ohlc.where("sym == `DOG`"),
x="binnedTimestamp",
open="open",
high="high",
low="low",
close="close",
stocks_1min_ohlc.where("Sym == `DOG`"),
x="BinnedTimestamp",
open="Open",
high="High",
low="Low",
close="Close",
)
```

Expand Down
77 changes: 49 additions & 28 deletions plugins/plotly-express/docs/density_heatmap.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,59 @@
# Density Heatmap Plot

A density heatmap plot is a data visualization that uses a colored grid to represent a count over two columns or more (generally an aggregation over three columns). The grid is divided into cells colored based on the aggregated value of the data points that fall within each cell. Passing in one independent variable and one dependent variable provides an approximating replacement for a scatter plot when there are too many data points to be easily visualized. Providing two independent variables and a third dependent variable allows for a more general aggregation to assess a specific metric of the data distribution. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis.
A density heatmap plot is a data visualization that uses a colored grid to represent the joint distribution of a pair of continuous variables. More generally, density heatmaps can be used to visualize any statistical aggregation over a pair of continuous variables. The pair of continuous variables may be explanatory and response variables. In this case, a density heatmap provides an approximation to a scatter plot when there are too many data points to be easily visualized. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis, yielding 100 bins in total.

#### When are density heatmap plots appropriate?

Density heatmap plots are appropriate when the data contains two continuous variables of interest and optionally a third dependent variable.
Density heatmaps are appropriate when the data contain two continuous variables of interest. An additional quantitative variable may be incorporated into the visualization using shapes or colors.

#### What are density heatmap plots useful for?

- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a scatter plot.
- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a [scatter plot](scatter.md).
- **2D Density Estimation**: Density heatmaps can serve as the basis for 2D density estimation methods, helping to model and understand underlying data distributions, which is crucial in statistical analysis and machine learning.
- **Metric Assessment**: By aggregating data points within each cell, density heatmaps can provide insights into the distribution of a specific metric or value across different regions, highlighting groups for further analysis.

## Examples

### A basic density heatmap

Visualize the counts of data points between two continuous variables within a grid. This could possibly replace a scatter plot when there are too many data points to be easily visualized.
Visualize the joint distribution of two variables by passing each column name to the `x` and `y` arguments.

```python order=heatmap,iris
import deephaven.plot.express as dx
iris = dx.data.iris()

# Create a basic density heatmap by specifying columns for the `x` and `y` axes
heatmap = dx.density_heatmap(iris, x="petal_length", y="petal_width")
heatmap = dx.density_heatmap(iris, x="PetalLength", y="PetalWidth")
```

### A density heatmap with a custom color scale

Visualize the counts of data points between two continuous variables within a grid with a custom color scale.
Custom color scales can be provided to the `color_continuous_scale` argument, and their range can be defined with the `range_color` argument.

```py order=heatmap_colorscale,iris
import deephaven.plot.express as dx
iris = dx.data.iris() # Import a ticking version of the Iris dataset
iris = dx.data.iris()

# Color the heatmap using the "viridis" color scale with a range from 5 to 8
heatmap_colorscale = dx.density_heatmap(
iris,
x="petal_length",
y="petal_width",
# use the "viridis" color scale with a range from 5 to 8
heatmap_colorscale = dx.density_heatmap(iris,
x="PetalLength",
y="PetalWidth",
color_continuous_scale="viridis",
range_color=[5, 8]
)
```

### A density heatmap with a custom grid size and range

Visualize the counts of data points between two continuous variables within a grid with a custom grid size and range. The number of bins significantly impacts the visualization by changing the granularity of the grid.
The number of bins on each axis can be set using the `nbinsx` and `nbinsy` arguments. The number of bins significantly impacts the visualization by changing the granularity of the grid.

```py order=heatmap_bins,iris
import deephaven.plot.express as dx
iris = dx.data.iris() # import a ticking version of the Iris dataset
iris = dx.data.iris()

# Create a density heatmap with 20 bins on each axis and a range from 3 to the maximum value for the x-axis.
# None is used to specify an upper bound of the maximum value.
heatmap_bins = dx.density_heatmap(
iris,
x="petal_length",
y="petal_width",
x="PetalLength",
y="PetalWidth",
nbinsx=20,
nbinsy=20,
range_bins_x=[3, None],
Expand All @@ -66,23 +62,48 @@ heatmap_bins = dx.density_heatmap(

### A density heatmap with a custom aggregation function

Visualize the average of a third dependent continuous variable across the grid. Histfuncs can only be used when three columns are provided. Possible histfuncs are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`.
Use an additional continuous variable to color the heatmap. Many statistical aggregations can be computed on this column by providing the `histfunc` argument. Possible values for the `histfunc` are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`.

```py order=heatmap_aggregation,iris

import deephaven.plot.express as dx
iris = dx.data.iris() # import a ticking version of the Iris dataset
iris = dx.data.iris()

# Create a density heatmap with an average aggregation function.
heatmap_aggregation = dx.density_heatmap(
iris,
x="petal_length",
y="petal_width",
z="sepal_length",
# color the map by the average of an additional continuous variable
heatmap_aggregation = dx.density_heatmap(iris,
x="PetalLength",
y="PetalWidth",
z="SepalLength",
histfunc="avg"
)
```

### Large datasets

Visualize the joint distribution of a large dataset (10 million rows in this example) by passing each column name to the `x` and `y` arguments. Increasing the number of bins can produce a much smoother visualization.

```python order=large_heatmap_2,large_heatmap_1,large_data
from deephaven.plot import express as dx
from deephaven import empty_table

large_data = empty_table(10_000_000).update([
"X = 50 + 25 * cos(i * Math.PI / 180)",
"Y = 50 + 25 * sin(i * Math.PI / 180)",
])

# specify range to see entire plot
large_heatmap_1 = dx.density_heatmap(large_data, x="X", y="Y", range_bins_x=[0,100], range_bins_y=[0,100])

# using bins may be useful for more precise visualizations
large_heatmap_2 = dx.density_heatmap(
large_data,
x="X",
y="Y",
range_bins_x=[0,100],
range_bins_y=[0,100],
nbinsx=100,
nbinsy=100
)
```

## API Reference
```{eval-rst}
Expand Down
Loading
Loading