Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: MVP plotly-express docs #554

Merged
merged 39 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d124531
Docs
alexpeters1208 Jun 11, 2024
af94f9f
Continue docs
alexpeters1208 Jun 12, 2024
6907460
More docs
alexpeters1208 Jun 13, 2024
8b56f55
Add polar and ternary examples
alexpeters1208 Jun 18, 2024
2fbf7b2
Add multiple-axes
alexpeters1208 Jun 20, 2024
03c067b
Start re-wording
alexpeters1208 Jun 20, 2024
275aa7d
Fix spacing
alexpeters1208 Jun 20, 2024
50ee789
Start "what are they useful for"
alexpeters1208 Jun 20, 2024
566fadc
Small language changes
alexpeters1208 Jun 20, 2024
6d15404
When are they appropriate
alexpeters1208 Jun 21, 2024
ad907ba
Update ecdf
alexpeters1208 Jun 21, 2024
1e21a4d
Merge branch 'main' into dx-min-docs
alexpeters1208 Jun 21, 2024
b615e3e
Update notes and warnings
alexpeters1208 Jun 21, 2024
1ad4acd
Simplify "what are they useful for"
alexpeters1208 Jun 21, 2024
d872a90
Don Area suggestion
alexpeters1208 Jun 24, 2024
d39b2b7
More review suggestions
alexpeters1208 Jun 28, 2024
643003a
Funnel plot
alexpeters1208 Jul 2, 2024
6eb71fb
Funnel, funnel area, timeline
alexpeters1208 Jul 3, 2024
9ec22f5
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 9, 2024
ab8ebf8
Start plot by
alexpeters1208 Jul 10, 2024
19dd7b3
More plot by
alexpeters1208 Jul 10, 2024
0cb05ce
Apply suggestions from code review
alexpeters1208 Jul 15, 2024
143249e
See how area plot renders
alexpeters1208 Jul 15, 2024
f9436bf
Check rendering
alexpeters1208 Jul 15, 2024
a47adfe
More tidying
alexpeters1208 Jul 16, 2024
d6638b6
More plot by
alexpeters1208 Jul 16, 2024
86c1b73
First round of revisions from Chip
alexpeters1208 Jul 24, 2024
4e4cc91
Scatter progress
alexpeters1208 Jul 25, 2024
6ef84bc
Merge branch 'main' into dx-min-docs
alexpeters1208 Jul 25, 2024
b4a186f
Revise scatter
alexpeters1208 Jul 25, 2024
ab8f2c4
More polish
alexpeters1208 Jul 25, 2024
778246e
More polish, Don review, add density heatmap
alexpeters1208 Jul 25, 2024
b5665a5
Pascal case
alexpeters1208 Jul 26, 2024
a562b64
Links
alexpeters1208 Jul 26, 2024
a432741
Revise concept pieces
alexpeters1208 Jul 26, 2024
c408bf2
Deterministic large datasets
alexpeters1208 Jul 26, 2024
f3c7ebe
Chip review
alexpeters1208 Jul 26, 2024
83fcb87
Chip and Joe suggestions
alexpeters1208 Jul 26, 2024
6dc0ed6
Move density heatmap up
alexpeters1208 Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,6 @@ playwright/.cache/
# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*
replay_pid*

plugins-venv/
plugins-dev-venv/
17 changes: 17 additions & 0 deletions plugins/plotly-express/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,23 @@ my_plot = dx.line(table=my_table, x="timestamp", y="price", color="sym")

In this example, we create a Deephaven table and create a line plot of `timestamp` against `price` with automatic downsampling. A trace is created for each value in the `sym` column, each of which has a unique color.

## Documentation Terminology

The documentation for Deephaven Express will routinely use some common terms to help clarify how plots are intended to be used:
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

- **Variable**: Variables, usually represented as columns in a Deephaven table, are a series of data points or observations of a particular characteristic in the data set. Examples include age, GDP, stock price, wind direction, sex, zip code, shoe size, square footage, and height.

The following terms define different types of variable. Variable types are important because any given plot is usually only intended to be used with a specific variable type:

- **Categorical variable**: This is a variable with a countable (often small) number of possible measurements for which an average cannot be computed. Examples include sex, country, flower species, stock symbol, and last name. Zip code is also a categorical variable, because while it is made of numbers and can technically be averaged, the "average zip code" is not a sensible concept.
- **Discrete numeric variable** (often abbreviated to _discrete variable_): This is a variable with a countable number of possible measurements for which an average can be computed. These are typically represented with whole numbers. Examples include the number of wins in a season, number of bedrooms in a house, the size of one's immediate family, and the number of letters in a word.
- **Continuous numberic variable** (often abbreviated to _continuous variable_): This is a variable with a spectrum of possible measurements for which an average can be computed. These are typically represented with decimal or fractional numbers. Examples include height, square footage of a home, length of a flower petal, price of a stock, and the distance between two stars.

The following terms define relationships between variables. They do not describe attributes of a variable, but describe how a variable relates to others:

- **Explanatory variable**: An explanatory variable is a variable that other variables depend on in some important way. The most common example is time. If there are explanatory variables displayed in a plot, they are presented on the x-axis by convention.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
- **Response variable**: A response variable is a variable that depends directly on another variable (the explanatory variable) in some important way. A rule of thumb is that explanatory variables are used to make predictions about repsonse variables, but not conversely. If there are response variables displayed in a plot, they are presented on the y-axis by convention.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

## Contributing

We welcome contributions to Deephaven Express! If you encounter any issues, have ideas for improvements, or would like to add new features, please open an issue or submit a pull request on the [GitHub repository](https://github.com/deephaven/deephaven-plugins).
Expand Down
44 changes: 38 additions & 6 deletions plugins/plotly-express/docs/area.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,48 @@
# Area Plot

An area plot, also known as a stacked area chart, is a data visualization that uses multiple filled areas stacked on top of one another to represent the cumulative contribution of distinct categories over a continuous interval or time period. This makes it valuable for illustrating the composition and trends within data, especially when comparing the distribution of different categories.
An area plot, also known as a stacked area chart, is a data visualization that uses multiple filled areas stacked on top of one another to represent the cumulative contribution of distinct categories over a continuous interval or time period.
dsmmcken marked this conversation as resolved.
Show resolved Hide resolved

Area plots are useful for:
#### When are area plots appropriate?

1. **Comparing Category Trends**: Use area plots to compare and track trends in different categories over time, providing a clear view of their cumulative contributions.
2. **Proportional Representation**: When you need to show the relative proportion of different categories within a dataset, area plots offer an effective means of visualizing this information.
3. **Data Composition**: Area plots are ideal for revealing the composition and distribution of data categories, making them useful in scenarios where the relative makeup of categories is crucial.
4. **Time Series Analysis**: For time-dependent data, area plots are valuable for displaying changes in categorical contributions and overall trends over time.
Area plots are appropriate when the data contain a continuous response variable that directly depends on a continuous explanatory variable, such as time. Further, the response variable can be broken down into contributions from each of several independent categories, and those categories are represented by an additional categorical variable.

#### What are area plots are useful for?

- **Visualizing Trends Over Time**: Area plots are great for displaying the trend of a single continuous variable. The filled areas can make it easier to see the magnitude of changes and trends compared to line plots.
- **Displaying Cumulative Totals**: Area plots are effective in showing cumulative totals over a period. They can help in understanding the contribution of different categories to the total amount and how these contributions evolve.
- **Comparing Multiple Categories**: Rather than providing a single snapshot of the composition of a total, area plots show how contributions from each category change over time. The different colored or shaded areas help distinguish each category, making it easier to see their individual contributions and to compare how those categories evolve.

## Examples

### A basic area plot

Visualize the relationship between two variables. In this case, an area plot is similar to a line plot.

```python order=area_plot,usa_population
import deephaven.plot.express as dx
gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

# subset to get a specific group
usa_population = gapminder.where("country == `United States`")

# create a basic area plot by specifying columns for the `x` and `y` axes
area_plot = dx.area(usa_population, x="year", y="pop")
```

### Color by group
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Use the `by` argument to specify a grouping column.

```python order=area_plot_multi,large_countries_population
import deephaven.plot.express as dx
gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset

# subset to get a few categories to compare
large_countries_population = gapminder.where("country in `United States`, `India`, `China`")

# the `by` uses unique values in the supplied column to color the plot according to those column values
area_plot_multi = dx.area(large_countries_population, x="year", y="pop", by="country")
```

## API Reference
```{eval-rst}
Expand Down
40 changes: 34 additions & 6 deletions plugins/plotly-express/docs/bar.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,45 @@

A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups, making it easy to compare and visualize the distribution of data.

Advantages of bar plots include:
#### When are bar plots appropriate?

1. **Comparative Clarity**: Bar plots are highly effective for comparing data across different categories or groups. They provide a clear visual representation of relative differences and make it easy to identify trends within the dataset.
2. **Categorical Representation**: Bar plots excel at representing categorical data, such as survey responses, product sales by region, or user preferences. Each category is presented as a distinct bar, simplifying the visualization of categorical information.
3. **Ease of Use**: Bar plots are user-friendly and quick to generate, making them a practical choice for various applications.
4. **Data Aggregation**: Bar plots allow for easy aggregation of data within categories, simplifying the visualization of complex datasets, and aiding in summarizing and comparing information efficiently.
Bar plots are appropriate when the data contain a continuous response variable that is directly related to a categorical explanatory variable. Additionally, if the response variable is a cumulative total of contributions from different subcategories, each bar can be broken up to demonstrate those contributions.

Bar plots have limitations and are not suitable for certain scenarios. They are not ideal for continuous data, ineffective for multi-dimensional data exceeding two dimensions, and unsuitable for time-series data trends. Additionally, they become less practical with extremely sparse datasets and are inadequate for representing complex interactions or correlations among multiple variables.
#### What are bar plots useful for?

- **Comparing Categorical Data**: Bar plots are ideal for comparing the quantities or frequencies of different categories. The height of each bar represents the value of each category, making it easy to compare them at a glance.
- **Decomposing Data by Category**: When the data belong to several independent categories, bar plots make it easy to visualize the relative contributions of each category to the overall total. The bar segments are colored by category, making it easy to identify the contribution of each.
- **Tracking Trends**: If the categorical explanatory variable can be ordered left-to-right (like day of week), then bar plots provide a visualization of how the response variable changes as the explanatory variable evolves.

## Examples

### A basic bar plot

Visualize the relationship between a continuous variable and a categorical or discrete variable. By default, the y-axis shows the cumulative value for each group over the whole dataset.

```python order=bar_plot,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset

# create a basic bar plot by specifying columns for the `x` and `y` axes
bar_plot = dx.bar(tips, x="day", y="total_bill")
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
```

### Partition bars by group

Use the `by` argument to break each bar up into contributions from the given group.

```python order=bar_plot_smoke,bar_plot_sex,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset

# Ex 1. Partition bars by smoker / non-smoker
bar_plot_smoke = dx.bar(tips, x="day", y="total_bill", by="smoker")

# Ex 2. Partition bars by male / female
bar_plot_sex = dx.bar(tips, x="day", y="total_bill", by="sex")
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
```

## API Reference
```{eval-rst}
.. autofunction:: deephaven.plot.express.bar
Expand Down
40 changes: 35 additions & 5 deletions plugins/plotly-express/docs/box.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,45 @@

A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data.

Box plots are useful for:
#### When are box plots appropriate?
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

1. **Visualizing Spread and Center**: Box plots provide a clear representation of the spread and central tendency of data, making it easy to understand the distribution's characteristics.
2. **Identification of Outliers**: They are effective in identifying outliers within a dataset, helping to pinpoint data points that deviate significantly from the norm.
3. **Comparative Analysis**: Box plots allow for easy visual comparison of multiple datasets or categories, making them useful for assessing variations and trends in data.
4. **Robustness**: Box plots are robust to extreme values and data skewness, providing a reliable means of visualizing data distributions even in the presence of outliers or non-normal data.
Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate.

#### What are box plots are useful for?
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

- **Visualizing Overall Distribution**: Box plots reveal the distribution of the variable of interest. They are good first-line tools for assessing whether a variable's distribution is symmetric, right-skewed, or left-skewed.
- **Assessing Center and Spread**: A box plot displays the center (median) of a dataset using the middle line, and displays the spread (IQR) using the width of the box.
- **Identifying Potential Outliers**: The dots displayed outside of the fenceposts in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution.

## Examples

### A basic box plot

Visualize the distribution of a single continuous variable using a box plot. Singular points lying outside the "fences" are candidates for being outliers.

```python order=total_bill_plot,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset

# create a basic box plot by specifying the variable of interest with `y`
total_bill_plot = dx.box(tips, y="total_bill")
```

### Distributions for multiple groups

Box plots are useful making comparisons between the distributions of two or more groups of data. Use the `by` argument to specify a grouping column.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

```python order=total_bill_smoke,total_bill_sex,tips
import deephaven.plot.express as dx
tips = dx.data.tips() # import a ticking version of the Tips dataset

# Ex 1. Total bill distribution by smoker / non-smoker
total_bill_smoke = dx.box(tips, y="total_bill", by="smoker")

# Ex 2. Total bill distribution by male / female
total_bill_sex = dx.box(tips, y="total_bill", by="sex")
jnumainville marked this conversation as resolved.
Show resolved Hide resolved
```

## API Reference
```{eval-rst}
.. autofunction:: deephaven.plot.express.box
Expand Down
46 changes: 41 additions & 5 deletions plugins/plotly-express/docs/candlestick.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,51 @@ Interpreting a candlestick chart involves understanding the visual representatio

In a bullish (upward, typically shown as green) candlestick, the open is typically at the bottom of the body, and the close is at the top, indicating a price increase. In a bearish (downward, typically shown as red) candlestick, the open is at the top of the body, and the close is at the bottom, suggesting a price decrease. One can use these patterns, along with the length of the wicks and the context of adjacent candlesticks, to analyze trends.

Candlestick plots are useful for:
#### When are candlestick plots appropriate?

1. **Analyzing Financial Markets**: They are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial markets, such as stocks, forex, and cryptocurrencies.
2. **Short to Medium-Term Trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame.
3. **Pattern Recognition**: They aid in recognizing and interpreting common candlestick patterns, which can provide insights into market sentiment and potential price movements.
4. **Visualizing Variation in Price Data**: Candlestick charts offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis.
Candle stick plots are generally only appropriate for financial data, due to their specialized requirements.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

#### What are candlestick plots useful for?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten to "Use cases"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried an alternative, and settled on the current version. I like the way it reads the best. Keeping this open to reassess when we get real renderings.


- **Analyzing Financial Markets**: They are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial markets, such as stocks, forex, and cryptocurrencies.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
- **Short to Medium-Term Trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame.
- **Visualizing Variation in Price Data**: Candlestick charts offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

## Examples

### A basic candlestick plot

Visualize the key summary statics of a single continuous variable as it evolves.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved

```python order=candlestick_plot,stocks_1min_ohlc,stocks
import deephaven.plot.express as dx
import deephaven.agg as agg
stocks = dx.data.stocks() # import the example stock market data set

# compute ohlc per symbol for each minute
stocks_1min_ohlc = stocks.update_view(
"binnedTimestamp = lowerBin(timestamp, 'PT1m')"
).agg_by(
[
agg.first("open=price"),
agg.max_("high=price"),
agg.min_("low=price"),
agg.last("close=price"),
],
by=["sym", "binnedTimestamp"],
)

# create a basic candlestick plot - the `open`, `high`, `low`, and `close` arguments must be specified
candlestick_plot = dx.candlestick(
stocks_1min_ohlc.where("sym == `DOG`"),
x="binnedTimestamp",
open="open",
high="high",
low="low",
close="close",
)
```

## API Reference
```{eval-rst}
.. autofunction:: deephaven.plot.express.candlestick
Expand Down
16 changes: 10 additions & 6 deletions plugins/plotly-express/docs/ecdf.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# ECDF Plot

> [!WARNING]
> This plot type is not yet implemented.

An Empirical Cumulative Distribution Function (ECDF) plot is a non-parametric statistical tool used to visualize the distribution of data. It displays the cumulative proportion of data points that are less than or equal to a given value, providing insights into data spread and characteristics without making assumptions about the underlying probability distribution.

Interpreting an Empirical Cumulative Distribution Function (ECDF) involves examining its curve, which represents the cumulative proportion of data points below a given value. The steepness of the ECDF curve at a particular point indicates the density of data points at that value. Steeper segments imply higher density, while flatter segments suggest lower density. Comparing an ECDF to a histogram of the same data can reveal the correspondence between continuous and discrete representations of the data distribution. The ECDF provides a more detailed view of the data's distribution, whereas a histogram simplifies the distribution into discrete bins, making it easier to identify data patterns, central tendencies, and outliers.

Empirical Cumulative Distribution Function (ECDF) are useful for:
#### When are ECDF plots appropriate?

ECDF plots are appropriate when the data contain a continuous variable of interest.

1. **Distribution Visualization**: When you want to visualize the distribution of a dataset and understand the cumulative behavior of data points.
2. **Comparison of Datasets**: For comparing the distributions of multiple datasets or variables, particularly when assessing how they differ or overlap.
3. **Outlier Identification**: To identify potential outliers and extreme values within the dataset, as they often stand out in the ECDF plot.
4. **Hypothesis Testing**: When conducting hypothesis tests, an ECDF can help assess whether the observed data conforms to a specific theoretical distribution or model, aiding in statistical analysis and decision-making.
#### What are ECDF plots useful for?

## Examples
- **Distribution Visualization**: When you want to visualize the distribution of a dataset and understand the cumulative behavior of data points.
- **Comparison to Normality**: ECDF plots are often plotted against the ECDF of an appropriate normal distribution to give an indication of whether the data are normally distributed.
alexpeters1208 marked this conversation as resolved.
Show resolved Hide resolved
- **Computing Empirical Percentiles**: ECDF plots can be used to compute the empirical percentile of any given value in a dataset, yielding a quick and easy way to visualize a laborious calculation.
Loading
Loading