Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: MVP plotly-express docs #554

Merged
merged 39 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d124531
Docs
alexpeters1208 Jun 11, 2024
af94f9f
Continue docs
alexpeters1208 Jun 12, 2024
6907460
More docs
alexpeters1208 Jun 13, 2024
8b56f55
Add polar and ternary examples
alexpeters1208 Jun 18, 2024
2fbf7b2
Add multiple-axes
alexpeters1208 Jun 20, 2024
03c067b
Start re-wording
alexpeters1208 Jun 20, 2024
275aa7d
Fix spacing
alexpeters1208 Jun 20, 2024
50ee789
Start "what are they useful for"
alexpeters1208 Jun 20, 2024
566fadc
Small language changes
alexpeters1208 Jun 20, 2024
6d15404
When are they appropriate
alexpeters1208 Jun 21, 2024
ad907ba
Update ecdf
alexpeters1208 Jun 21, 2024
1e21a4d
Merge branch 'main' into dx-min-docs
alexpeters1208 Jun 21, 2024
b615e3e
Update notes and warnings
alexpeters1208 Jun 21, 2024
1ad4acd
Simplify "what are they useful for"
alexpeters1208 Jun 21, 2024
d872a90
Don Area suggestion
alexpeters1208 Jun 24, 2024
d39b2b7
More review suggestions
alexpeters1208 Jun 28, 2024
643003a
Funnel plot
alexpeters1208 Jul 2, 2024
6eb71fb
Funnel, funnel area, timeline
alexpeters1208 Jul 3, 2024
9ec22f5
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 9, 2024
ab8ebf8
Start plot by
alexpeters1208 Jul 10, 2024
19dd7b3
More plot by
alexpeters1208 Jul 10, 2024
0cb05ce
Apply suggestions from code review
alexpeters1208 Jul 15, 2024
143249e
See how area plot renders
alexpeters1208 Jul 15, 2024
f9436bf
Check rendering
alexpeters1208 Jul 15, 2024
a47adfe
More tidying
alexpeters1208 Jul 16, 2024
d6638b6
More plot by
alexpeters1208 Jul 16, 2024
86c1b73
First round of revisions from Chip
alexpeters1208 Jul 24, 2024
4e4cc91
Scatter progress
alexpeters1208 Jul 25, 2024
6ef84bc
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 25, 2024
b4a186f
Revise scatter
alexpeters1208 Jul 25, 2024
ab8f2c4
More polish
alexpeters1208 Jul 25, 2024
778246e
More polish, Don review, add density heatmap
alexpeters1208 Jul 25, 2024
b5665a5
Pascal case
alexpeters1208 Jul 26, 2024
a562b64
Links
alexpeters1208 Jul 26, 2024
a432741
Revise concept pieces
alexpeters1208 Jul 26, 2024
c408bf2
Deterministic large datasets
alexpeters1208 Jul 26, 2024
f3c7ebe
Chip review
alexpeters1208 Jul 26, 2024
83fcb87
Chip and Joe suggestions
alexpeters1208 Jul 26, 2024
6dc0ed6
Move density heatmap up
alexpeters1208 Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,6 @@ playwright/.cache/
# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*
replay_pid*

plugins-venv/
plugins-dev-venv/
17 changes: 17 additions & 0 deletions plugins/plotly-express/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,23 @@ my_plot = dx.line(table=my_table, x="timestamp", y="price", color="sym")

In this example, we create a Deephaven table and create a line plot of `timestamp` against `price` with automatic downsampling. A trace is created for each value in the `sym` column, each of which has a unique color.

## Documentation Terminology

The documentation for Deephaven Express routinely uses some common terms to help clarify how plots are intended to be used:

- **Variable**: Variables, usually represented as columns in a Deephaven table, are a series of data points or observations of a particular characteristic in the data set. Examples include age, GDP, stock price, wind direction, sex, zip code, shoe size, square footage, and height.

The following terms define different types of variable. Variable types are important because any given plot is usually only intended to be used with a specific variable type:

- **Categorical variable**: This is a variable with a countable (often small) number of possible measurements for which an average cannot be computed. Examples include sex, country, flower species, stock symbol, and last name. Zip code is also a categorical variable, because while it is made of numbers and can technically be averaged, the "average zip code" is not a sensible concept.
- **Discrete numeric variable** (often abbreviated to _discrete variable_): This is a variable with a countable number of possible measurements for which an average can be computed. These are typically represented with whole numbers. Examples include the number of wins in a season, number of bedrooms in a house, the size of one's immediate family, and the number of letters in a word.
- **Continuous numberic variable** (often abbreviated to _continuous variable_): This is a variable with a spectrum of possible measurements for which an average can be computed. These are typically represented with decimal or fractional numbers. Examples include height, square footage of a home, length of a flower petal, price of a stock, and the distance between two stars.

The following terms define relationships between variables. They do not describe attributes of a variable, but describe how a variable relates to others:

- **Explanatory variable**: A variable that other variables depend on in some important way. The most common example is time. If explanatory variables are displayed in a plot, they are presented on the x-axis by convention.
- **Response variable**: A variable that depends directly on another variable (the explanatory variable) in some important way. A rule of thumb is that explanatory variables are used to make predictions about repsonse variables, but not conversely. If response variables are displayed in a plot, they are presented on the y-axis by convention.

## Contributing

We welcome contributions to Deephaven Plotly Express! If you encounter any issues, have ideas for improvements, or would like to add new features, please open an issue or submit a pull request on the [GitHub repository](https://github.com/deephaven/deephaven-plugins).
Expand Down
41 changes: 35 additions & 6 deletions plugins/plotly-express/docs/area.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,45 @@
# Area Plot

An area plot, also known as a stacked area chart, is a data visualization that uses multiple filled areas stacked on top of one another to represent the cumulative contribution of distinct categories over a continuous interval or time period. This makes it valuable for illustrating the composition and trends within data, especially when comparing the distribution of different categories.
An area plot, also known as a stacked area chart, is a data visualization that uses multiple filled areas stacked on top of one another to represent the cumulative contribution of distinct categories over a continuous interval or time period. Area plots always start the y-axis at zero, because the height of each line at any point is exactly equal to its contribution to the whole, and the proportion of each category's contribution must be represented faithfully.

Area plots are useful for:
Area plots are appropriate when the data contain a continuous response variable that directly depends on a continuous explanatory variable, such as time. Further, the response variable can be broken down into contributions from each of several independent categories, and those categories are represented by an additional categorical variable.

1. **Comparing Category Trends**: Use area plots to compare and track trends in different categories over time, providing a clear view of their cumulative contributions.
2. **Proportional Representation**: When you need to show the relative proportion of different categories within a dataset, area plots offer an effective means of visualizing this information.
3. **Data Composition**: Area plots are ideal for revealing the composition and distribution of data categories, making them useful in scenarios where the relative makeup of categories is crucial.
4. **Time Series Analysis**: For time-dependent data, area plots are valuable for displaying changes in categorical contributions and overall trends over time.
### What are area plots useful for?

- **Visualizing trends over time**: Area plots are great for displaying the trend of a single continuous variable. The filled areas can make it easier to see the magnitude of changes and trends compared to line plots.
- **Displaying cumulative totals**: Area plots are effective in showing cumulative totals over a period. They can help in understanding the contribution of different categories to the total amount and how these contributions evolve.
- **Comparing multiple categories**: Rather than providing a single snapshot of the composition of a total, area plots show how contributions from each category change over time.

## Examples

### A basic area plot

Visualize the relationship between two variables by passing each column name to the `x` and `y` arguments.

```python order=area_plot,usa_population
import deephaven.plot.express as dx
gapminder = dx.data.gapminder()

# subset to get a specific group
usa_population = gapminder.where("Country == `United States`")

area_plot = dx.area(usa_population, x="Year", y="Pop")
```

### Area by group

Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Pass the name of the grouping column(s) to the `by` argument.

```python order=area_plot_group,large_countries_population
import deephaven.plot.express as dx
gapminder = dx.data.gapminder()

# subset to get several countries to compare
large_countries_population = gapminder.where("Country in `United States`, `India`, `China`")

# cumulative trend showing contribution from each group
area_plot_group = dx.area(large_countries_population, x="Year", y="Pop", by="Country")
```

## API Reference
```{eval-rst}
Expand Down
51 changes: 44 additions & 7 deletions plugins/plotly-express/docs/bar.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,55 @@
# Bar Plot

A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups, making it easy to compare and visualize the distribution of data.
A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups. Bar plots aggregate the response variable across the entire dataset for each category, so that the y-axis represents the sum of the response variable per category.

Advantages of bar plots include:
Bar plots are appropriate when the data contain a continuous response variable that is directly related to a categorical explanatory variable. Additionally, if the response variable is a cumulative total of contributions from different subcategories, each bar can be broken up to demonstrate those contributions.

1. **Comparative Clarity**: Bar plots are highly effective for comparing data across different categories or groups. They provide a clear visual representation of relative differences and make it easy to identify trends within the dataset.
2. **Categorical Representation**: Bar plots excel at representing categorical data, such as survey responses, product sales by region, or user preferences. Each category is presented as a distinct bar, simplifying the visualization of categorical information.
3. **Ease of Use**: Bar plots are user-friendly and quick to generate, making them a practical choice for various applications.
4. **Data Aggregation**: Bar plots allow for easy aggregation of data within categories, simplifying the visualization of complex datasets, and aiding in summarizing and comparing information efficiently.
### What are bar plots useful for?

Bar plots have limitations and are not suitable for certain scenarios. They are not ideal for continuous data, ineffective for multi-dimensional data exceeding two dimensions, and unsuitable for time-series data trends. Additionally, they become less practical with extremely sparse datasets and are inadequate for representing complex interactions or correlations among multiple variables.
- **Comparing categorical data**: Bar plots are ideal for comparing the quantities or frequencies of different categories. The height of each bar represents the value of each category, making it easy to compare them at a glance.
- **Decomposing data by category**: When the data belong to several independent categories, bar plots make it easy to visualize the relative contributions of each category to the overall total. The bar segments are colored by category, making it easy to identify the contribution of each.
- **Tracking trends**: If the categorical explanatory variable can be ordered left-to-right (like day of week), then bar plots provide a visualization of how the response variable changes as the explanatory variable evolves.

## Examples

### A basic bar plot

Visualize the relationship between a continuous variable and a categorical or discrete variable by passing the column names to the `x` and `y` arguments.

```python order=bar_plot,tips
import deephaven.plot.express as dx
tips = dx.data.tips()

bar_plot = dx.bar(tips, x="Day", y="TotalBill")
```

Change the x-axis ordering by sorting the dataset by the categorical variable.

```python order=ordered_bar_plot,tips
import deephaven.plot.express as dx
tips = dx.data.tips()

# sort the dataset to get a specific x-axis ordering, sort() acts alphabetically
ordered_bar_plot = dx.bar(tips.sort("Day"), x="Day", y="TotalBill")
```

### Partition bars by group

Break bars down by group by passing the name of the grouping column(s) to the `by` argument.

```python order=bar_plot_smoke,bar_plot_sex,tips
import deephaven.plot.express as dx
tips = dx.data.tips()

sorted_tips = tips.sort("Day")

# group by smoker / non-smoker
bar_plot_smoke = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Smoker")

# group by male / female
bar_plot_sex = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Sex")
```

## API Reference
```{eval-rst}
.. dhautofunction:: deephaven.plot.express.bar
Expand Down
41 changes: 35 additions & 6 deletions plugins/plotly-express/docs/box.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,45 @@
# Box Plot

A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data.
A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data. To learn more about the mathematics involved in creating box plots, check out [this article](https://asq.org/quality-resources/box-whisker-plot).

Box plots are useful for:
Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate using the `by` argument.

1. **Visualizing Spread and Center**: Box plots provide a clear representation of the spread and central tendency of data, making it easy to understand the distribution's characteristics.
2. **Identification of Outliers**: They are effective in identifying outliers within a dataset, helping to pinpoint data points that deviate significantly from the norm.
3. **Comparative Analysis**: Box plots allow for easy visual comparison of multiple datasets or categories, making them useful for assessing variations and trends in data.
4. **Robustness**: Box plots are robust to extreme values and data skewness, providing a reliable means of visualizing data distributions even in the presence of outliers or non-normal data.
### What are box plots useful for?

- **Visualizing overall distribution**: Box plots reveal the distribution of the variable of interest. They are good first-line tools for assessing whether a variable's distribution is symmetric, right-skewed, or left-skewed.
- **Assessing center and spread**: A box plot displays the center (median) of a dataset using the middle line, and displays the spread (IQR) using the width of the box.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
- **Identifying potential outliers**: The dots displayed in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution.

## Examples

### A basic box plot

Visualize the distribution of a single variable by passing the column name to `x` or `y`.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

```python order=box_plot_x,box_plot_y,tips
import deephaven.plot.express as dx
tips = dx.data.tips()

# control the plot orientation using `x` or `y`
box_plot_x = dx.box(tips, x="TotalBill")
box_plot_y = dx.box(tips, y="TotalBill")
```

### Distributions for multiple groups

Box plots are useful for comparing the distributions of two or more groups of data. Pass the name of the grouping column(s) to the `by` argument.

```python order=box_plot_group_1,box_plot_group_2,tips
import deephaven.plot.express as dx
tips = dx.data.tips()

# total bill distribution by Smoker / non-Smoker
box_plot_group_1 = dx.box(tips, y="TotalBill", by="Smoker")

# total bill distribution by male / female
box_plot_group_2 = dx.box(tips, y="TotalBill", by="Sex")
```

## API Reference
```{eval-rst}
.. dhautofunction:: deephaven.plot.express.box
Expand Down
41 changes: 36 additions & 5 deletions plugins/plotly-express/docs/candlestick.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,46 @@ Interpreting a candlestick chart involves understanding the visual representatio

In a bullish (upward, typically shown as green) candlestick, the open is typically at the bottom of the body, and the close is at the top, indicating a price increase. In a bearish (downward, typically shown as red) candlestick, the open is at the top of the body, and the close is at the bottom, suggesting a price decrease. One can use these patterns, along with the length of the wicks and the context of adjacent candlesticks, to analyze trends.

Candlestick plots are useful for:
### What are candlestick plots useful for?

1. **Analyzing Financial Markets**: They are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial markets, such as stocks, forex, and cryptocurrencies.
2. **Short to Medium-Term Trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame.
3. **Pattern Recognition**: They aid in recognizing and interpreting common candlestick patterns, which can provide insights into market sentiment and potential price movements.
4. **Visualizing Variation in Price Data**: Candlestick charts offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis.
- **Analyzing financial markets**: Candlestick plots are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial instruments, such as stocks, forex, and cryptocurrencies.
- **Short to medium-term trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame.
- **Visualizing variation in price data**: Candlestick plots offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis.

## Examples

### A basic candlestick plot

Visualize the key summary statistics of a stock price as it evolves. Specify the column name of the instrument with `x`, and pass the `open`, `high`, `low`, and `close` arguments the appropriate column names.

```python order=candlestick_plot,stocks_1min_ohlc,stocks
import deephaven.plot.express as dx
import deephaven.agg as agg
stocks = dx.data.stocks()

# compute ohlc per symbol for each minute
stocks_1min_ohlc = stocks.update_view(
"BinnedTimestamp = lowerBin(Timestamp, 'PT1m')"
).agg_by(
[
agg.first("Open=Price"),
agg.max_("High=Price"),
agg.min_("Low=Price"),
agg.last("Close=Price"),
],
by=["Sym", "BinnedTimestamp"],
)

candlestick_plot = dx.candlestick(
stocks_1min_ohlc.where("Sym == `DOG`"),
x="BinnedTimestamp",
open="Open",
high="High",
low="Low",
close="Close",
)
```

## API Reference
```{eval-rst}
.. dhautofunction:: deephaven.plot.express.candlestick
Expand Down
Loading
Loading