From 4c556d39252c51275739ed3f6917ee6017d30294 Mon Sep 17 00:00:00 2001 From: Alex Peters <80283343+alexpeters1208@users.noreply.github.com> Date: Mon, 29 Jul 2024 10:40:48 -0500 Subject: [PATCH] docs: MVP plotly-express docs (#554) Minimum required docs for the plotly-express plugin. Here are the outstanding items: 1. Fill out "other". 2. Document `ecdf` once it is implemented. Directions for testing: As of 7/17, everything needed for testing is baked into a release. Here's a simple testing environment using pip-installed DH. ``` # make new dir for testing mkdir test-dx && cd test-dx # create env for installs python -m venv test-dx-venv source test-dx-venv/bin/activate # install some necessary things for the build pip install --upgrade pip setuptools # install the server, need 35.1 or 34.3 pip install deephaven-server==0.35.1 # install the plugin pip install deephaven-plugin-plotly-express # I need to do this to get `which deephaven` to give the correct venv version, you may not deactivate source test-dx-venv/bin/activate # start the server deephaven server ``` --------- Co-authored-by: margaretkennedy <82049573+margaretkennedy@users.noreply.github.com> --- .gitignore | 3 + plugins/plotly-express/docs/README.md | 17 + plugins/plotly-express/docs/area.md | 41 +- plugins/plotly-express/docs/bar.md | 51 +- plugins/plotly-express/docs/box.md | 41 +- plugins/plotly-express/docs/candlestick.md | 41 +- .../plotly-express/docs/density_heatmap.md | 77 +-- plugins/plotly-express/docs/ecdf.md | 14 - plugins/plotly-express/docs/funnel-area.md | 27 +- plugins/plotly-express/docs/funnel.md | 30 +- plugins/plotly-express/docs/histogram.md | 63 ++- plugins/plotly-express/docs/icicle.md | 31 +- plugins/plotly-express/docs/layer-plots.md | 36 +- plugins/plotly-express/docs/line-3d.md | 33 +- plugins/plotly-express/docs/line-polar.md | 23 +- plugins/plotly-express/docs/line-ternary.md | 22 +- plugins/plotly-express/docs/line.md | 392 +-------------- plugins/plotly-express/docs/multiple-axes.md | 63 ++- plugins/plotly-express/docs/ohlc.md | 42 +- plugins/plotly-express/docs/other.md | 19 - plugins/plotly-express/docs/pie.md | 30 +- plugins/plotly-express/docs/plot-by.md | 143 +++++- plugins/plotly-express/docs/scatter-3d.md | 172 ++++--- plugins/plotly-express/docs/scatter-polar.md | 21 +- .../plotly-express/docs/scatter-ternary.md | 21 +- plugins/plotly-express/docs/scatter.md | 459 ++++++++---------- plugins/plotly-express/docs/sidebar.json | 8 +- plugins/plotly-express/docs/strip.md | 42 +- plugins/plotly-express/docs/sub-plots.md | 35 +- plugins/plotly-express/docs/sunburst.md | 30 +- plugins/plotly-express/docs/timeline.md | 14 +- plugins/plotly-express/docs/treemap.md | 29 +- plugins/plotly-express/docs/violin.md | 37 +- 33 files changed, 1209 insertions(+), 898 deletions(-) delete mode 100644 plugins/plotly-express/docs/ecdf.md delete mode 100644 plugins/plotly-express/docs/other.md diff --git a/.gitignore b/.gitignore index ae0a65d79..9b472a161 100644 --- a/.gitignore +++ b/.gitignore @@ -35,3 +35,6 @@ playwright/.cache/ # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml hs_err_pid* replay_pid* + +plugins-venv/ +plugins-dev-venv/ \ No newline at end of file diff --git a/plugins/plotly-express/docs/README.md b/plugins/plotly-express/docs/README.md index 15f044b98..9752ec288 100644 --- a/plugins/plotly-express/docs/README.md +++ b/plugins/plotly-express/docs/README.md @@ -117,6 +117,23 @@ my_plot = dx.line(table=my_table, x="Timestamp", y="Price", color="Sym") In this example, we create a Deephaven table and create a line plot of `Timestamp` against `Price` with automatic downsampling. A trace is created for each value in the `Sym` column, each of which has a unique color. +## Documentation Terminology + +The documentation for Deephaven Express routinely uses some common terms to help clarify how plots are intended to be used: + +- **Variable**: Variables, usually represented as columns in a Deephaven table, are a series of data points or observations of a particular characteristic in the data set. Examples include age, GDP, stock price, wind direction, sex, zip code, shoe size, square footage, and height. + +The following terms define different types of variable. Variable types are important because any given plot is usually only intended to be used with a specific variable type: + +- **Categorical variable**: This is a variable with a countable (often small) number of possible measurements for which an average cannot be computed. Examples include sex, country, flower species, stock symbol, and last name. Zip code is also a categorical variable, because while it is made of numbers and can technically be averaged, the "average zip code" is not a sensible concept. +- **Discrete numeric variable** (often abbreviated to _discrete variable_): This is a variable with a countable number of possible measurements for which an average can be computed. These are typically represented with whole numbers. Examples include the number of wins in a season, number of bedrooms in a house, the size of one's immediate family, and the number of letters in a word. +- **Continuous numberic variable** (often abbreviated to _continuous variable_): This is a variable with a spectrum of possible measurements for which an average can be computed. These are typically represented with decimal or fractional numbers. Examples include height, square footage of a home, length of a flower petal, price of a stock, and the distance between two stars. + +The following terms define relationships between variables. They do not describe attributes of a variable, but describe how a variable relates to others: + +- **Explanatory variable**: A variable that other variables depend on in some important way. The most common example is time. If explanatory variables are displayed in a plot, they are presented on the x-axis by convention. +- **Response variable**: A variable that depends directly on another variable (the explanatory variable) in some important way. A rule of thumb is that explanatory variables are used to make predictions about repsonse variables, but not conversely. If response variables are displayed in a plot, they are presented on the y-axis by convention. + ## Contributing We welcome contributions to Deephaven Plotly Express! If you encounter any issues, have ideas for improvements, or would like to add new features, please open an issue or submit a pull request on the [GitHub repository](https://github.com/deephaven/deephaven-plugins). diff --git a/plugins/plotly-express/docs/area.md b/plugins/plotly-express/docs/area.md index 960644d7e..3d0c9527b 100644 --- a/plugins/plotly-express/docs/area.md +++ b/plugins/plotly-express/docs/area.md @@ -1,16 +1,45 @@ # Area Plot -An area plot, also known as a stacked area chart, is a data visualization that uses multiple filled areas stacked on top of one another to represent the cumulative contribution of distinct categories over a continuous interval or time period. This makes it valuable for illustrating the composition and trends within data, especially when comparing the distribution of different categories. +An area plot, also known as a stacked area chart, is a data visualization that uses multiple filled areas stacked on top of one another to represent the cumulative contribution of distinct categories over a continuous interval or time period. Area plots always start the y-axis at zero, because the height of each line at any point is exactly equal to its contribution to the whole, and the proportion of each category's contribution must be represented faithfully. -Area plots are useful for: +Area plots are appropriate when the data contain a continuous response variable that directly depends on a continuous explanatory variable, such as time. Further, the response variable can be broken down into contributions from each of several independent categories, and those categories are represented by an additional categorical variable. -1. **Comparing Category Trends**: Use area plots to compare and track trends in different categories over time, providing a clear view of their cumulative contributions. -2. **Proportional Representation**: When you need to show the relative proportion of different categories within a dataset, area plots offer an effective means of visualizing this information. -3. **Data Composition**: Area plots are ideal for revealing the composition and distribution of data categories, making them useful in scenarios where the relative makeup of categories is crucial. -4. **Time Series Analysis**: For time-dependent data, area plots are valuable for displaying changes in categorical contributions and overall trends over time. +### What are area plots useful for? + +- **Visualizing trends over time**: Area plots are great for displaying the trend of a single continuous variable. The filled areas can make it easier to see the magnitude of changes and trends compared to line plots. +- **Displaying cumulative totals**: Area plots are effective in showing cumulative totals over a period. They can help in understanding the contribution of different categories to the total amount and how these contributions evolve. +- **Comparing multiple categories**: Rather than providing a single snapshot of the composition of a total, area plots show how contributions from each category change over time. ## Examples +### A basic area plot + +Visualize the relationship between two variables by passing each column name to the `x` and `y` arguments. + +```python order=area_plot,usa_population +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# subset to get a specific group +usa_population = gapminder.where("Country == `United States`") + +area_plot = dx.area(usa_population, x="Year", y="Pop") +``` + +### Area by group + +Area plots are unique in that the y-axis demonstrates each groups' total contribution to the whole. Pass the name of the grouping column(s) to the `by` argument. + +```python order=area_plot_group,large_countries_population +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# subset to get several countries to compare +large_countries_population = gapminder.where("Country in `United States`, `India`, `China`") + +# cumulative trend showing contribution from each group +area_plot_group = dx.area(large_countries_population, x="Year", y="Pop", by="Country") +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/bar.md b/plugins/plotly-express/docs/bar.md index f50ad4941..37503beaf 100644 --- a/plugins/plotly-express/docs/bar.md +++ b/plugins/plotly-express/docs/bar.md @@ -1,18 +1,55 @@ # Bar Plot -A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups, making it easy to compare and visualize the distribution of data. +A bar plot is a graphical representation of data that uses rectangular bars to display the values of different categories or groups. Bar plots aggregate the response variable across the entire dataset for each category, so that the y-axis represents the sum of the response variable per category. -Advantages of bar plots include: +Bar plots are appropriate when the data contain a continuous response variable that is directly related to a categorical explanatory variable. Additionally, if the response variable is a cumulative total of contributions from different subcategories, each bar can be broken up to demonstrate those contributions. -1. **Comparative Clarity**: Bar plots are highly effective for comparing data across different categories or groups. They provide a clear visual representation of relative differences and make it easy to identify trends within the dataset. -2. **Categorical Representation**: Bar plots excel at representing categorical data, such as survey responses, product sales by region, or user preferences. Each category is presented as a distinct bar, simplifying the visualization of categorical information. -3. **Ease of Use**: Bar plots are user-friendly and quick to generate, making them a practical choice for various applications. -4. **Data Aggregation**: Bar plots allow for easy aggregation of data within categories, simplifying the visualization of complex datasets, and aiding in summarizing and comparing information efficiently. +### What are bar plots useful for? -Bar plots have limitations and are not suitable for certain scenarios. They are not ideal for continuous data, ineffective for multi-dimensional data exceeding two dimensions, and unsuitable for time-series data trends. Additionally, they become less practical with extremely sparse datasets and are inadequate for representing complex interactions or correlations among multiple variables. +- **Comparing categorical data**: Bar plots are ideal for comparing the quantities or frequencies of different categories. The height of each bar represents the value of each category, making it easy to compare them at a glance. +- **Decomposing data by category**: When the data belong to several independent categories, bar plots make it easy to visualize the relative contributions of each category to the overall total. The bar segments are colored by category, making it easy to identify the contribution of each. +- **Tracking trends**: If the categorical explanatory variable can be ordered left-to-right (like day of week), then bar plots provide a visualization of how the response variable changes as the explanatory variable evolves. ## Examples +### A basic bar plot + +Visualize the relationship between a continuous variable and a categorical or discrete variable by passing the column names to the `x` and `y` arguments. + +```python order=bar_plot,tips +import deephaven.plot.express as dx +tips = dx.data.tips() + +bar_plot = dx.bar(tips, x="Day", y="TotalBill") +``` + +Change the x-axis ordering by sorting the dataset by the categorical variable. + +```python order=ordered_bar_plot,tips +import deephaven.plot.express as dx +tips = dx.data.tips() + +# sort the dataset to get a specific x-axis ordering, sort() acts alphabetically +ordered_bar_plot = dx.bar(tips.sort("Day"), x="Day", y="TotalBill") +``` + +### Partition bars by group + +Break bars down by group by passing the name of the grouping column(s) to the `by` argument. + +```python order=bar_plot_smoke,bar_plot_sex,tips +import deephaven.plot.express as dx +tips = dx.data.tips() + +sorted_tips = tips.sort("Day") + +# group by smoker / non-smoker +bar_plot_smoke = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Smoker") + +# group by male / female +bar_plot_sex = dx.bar(sorted_tips, x="Day", y="TotalBill", by="Sex") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.bar diff --git a/plugins/plotly-express/docs/box.md b/plugins/plotly-express/docs/box.md index 7fea88941..d35a51cc5 100644 --- a/plugins/plotly-express/docs/box.md +++ b/plugins/plotly-express/docs/box.md @@ -1,16 +1,45 @@ # Box Plot -A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data. +A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset's distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data. To learn more about the mathematics involved in creating box plots, check out [this article](https://asq.org/quality-resources/box-whisker-plot). -Box plots are useful for: +Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate using the `by` argument. -1. **Visualizing Spread and Center**: Box plots provide a clear representation of the spread and central tendency of data, making it easy to understand the distribution's characteristics. -2. **Identification of Outliers**: They are effective in identifying outliers within a dataset, helping to pinpoint data points that deviate significantly from the norm. -3. **Comparative Analysis**: Box plots allow for easy visual comparison of multiple datasets or categories, making them useful for assessing variations and trends in data. -4. **Robustness**: Box plots are robust to extreme values and data skewness, providing a reliable means of visualizing data distributions even in the presence of outliers or non-normal data. +### What are box plots useful for? + +- **Visualizing overall distribution**: Box plots reveal the distribution of the variable of interest. They are good first-line tools for assessing whether a variable's distribution is symmetric, right-skewed, or left-skewed. +- **Assessing center and spread**: A box plot displays the center (median) of a dataset using the middle line, and displays the spread (IQR) using the width of the box. +- **Identifying potential outliers**: The dots displayed in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution. ## Examples +### A basic box plot + +Visualize the distribution of a single variable by passing the column name to `x` or `y`. + +```python order=box_plot_x,box_plot_y,tips +import deephaven.plot.express as dx +tips = dx.data.tips() + +# control the plot orientation using `x` or `y` +box_plot_x = dx.box(tips, x="TotalBill") +box_plot_y = dx.box(tips, y="TotalBill") +``` + +### Distributions for multiple groups + +Box plots are useful for comparing the distributions of two or more groups of data. Pass the name of the grouping column(s) to the `by` argument. + +```python order=box_plot_group_1,box_plot_group_2,tips +import deephaven.plot.express as dx +tips = dx.data.tips() + +# total bill distribution by Smoker / non-Smoker +box_plot_group_1 = dx.box(tips, y="TotalBill", by="Smoker") + +# total bill distribution by male / female +box_plot_group_2 = dx.box(tips, y="TotalBill", by="Sex") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.box diff --git a/plugins/plotly-express/docs/candlestick.md b/plugins/plotly-express/docs/candlestick.md index c56ac79db..9b39afdd5 100644 --- a/plugins/plotly-express/docs/candlestick.md +++ b/plugins/plotly-express/docs/candlestick.md @@ -6,15 +6,46 @@ Interpreting a candlestick chart involves understanding the visual representatio In a bullish (upward, typically shown as green) candlestick, the open is typically at the bottom of the body, and the close is at the top, indicating a price increase. In a bearish (downward, typically shown as red) candlestick, the open is at the top of the body, and the close is at the bottom, suggesting a price decrease. One can use these patterns, along with the length of the wicks and the context of adjacent candlesticks, to analyze trends. -Candlestick plots are useful for: +### What are candlestick plots useful for? -1. **Analyzing Financial Markets**: They are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial markets, such as stocks, forex, and cryptocurrencies. -2. **Short to Medium-Term Trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame. -3. **Pattern Recognition**: They aid in recognizing and interpreting common candlestick patterns, which can provide insights into market sentiment and potential price movements. -4. **Visualizing Variation in Price Data**: Candlestick charts offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis. +- **Analyzing financial markets**: Candlestick plots are a standard tool in technical analysis for understanding price movements, identifying trends, and potential reversal points in financial instruments, such as stocks, forex, and cryptocurrencies. +- **Short to medium-term trading**: Candlestick patterns are well-suited for short to medium-term trading strategies, where timely decisions are based on price patterns and trends over a specific time frame. +- **Visualizing variation in price data**: Candlestick plots offer a visually intuitive way to represent variability in price data, making them valuable for traders and analysts who prefer a visual approach to data analysis. ## Examples +### A basic candlestick plot + +Visualize the key summary statistics of a stock price as it evolves. Specify the column name of the instrument with `x`, and pass the `open`, `high`, `low`, and `close` arguments the appropriate column names. + +```python order=candlestick_plot,stocks_1min_ohlc,stocks +import deephaven.plot.express as dx +import deephaven.agg as agg +stocks = dx.data.stocks() + +# compute ohlc per symbol for each minute +stocks_1min_ohlc = stocks.update_view( + "BinnedTimestamp = lowerBin(Timestamp, 'PT1m')" +).agg_by( + [ + agg.first("Open=Price"), + agg.max_("High=Price"), + agg.min_("Low=Price"), + agg.last("Close=Price"), + ], + by=["Sym", "BinnedTimestamp"], +) + +candlestick_plot = dx.candlestick( + stocks_1min_ohlc.where("Sym == `DOG`"), + x="BinnedTimestamp", + open="Open", + high="High", + low="Low", + close="Close", +) +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.candlestick diff --git a/plugins/plotly-express/docs/density_heatmap.md b/plugins/plotly-express/docs/density_heatmap.md index 1e4584b7a..0e903c46b 100644 --- a/plugins/plotly-express/docs/density_heatmap.md +++ b/plugins/plotly-express/docs/density_heatmap.md @@ -1,14 +1,12 @@ # Density Heatmap Plot -A density heatmap plot is a data visualization that uses a colored grid to represent a count over two columns or more (generally an aggregation over three columns). The grid is divided into cells colored based on the aggregated value of the data points that fall within each cell. Passing in one independent variable and one dependent variable provides an approximating replacement for a scatter plot when there are too many data points to be easily visualized. Providing two independent variables and a third dependent variable allows for a more general aggregation to assess a specific metric of the data distribution. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis. +A density heatmap plot is a data visualization that uses a colored grid to represent the joint distribution of a pair of continuous variables. More generally, density heatmaps can be used to visualize any statistical aggregation over a pair of continuous variables. The pair of continuous variables may be explanatory and response variables. In this case, a density heatmap provides an approximation to a scatter plot when there are too many data points to be easily visualized. The number of grid bins significantly impacts the visualization. Currently, the grid bins default to 10 on each axis, yielding 100 bins in total. -#### When are density heatmap plots appropriate? - -Density heatmap plots are appropriate when the data contains two continuous variables of interest and optionally a third dependent variable. +Density heatmaps are appropriate when the data contain two continuous variables of interest. An additional quantitative variable may be incorporated into the visualization using shapes or colors. #### What are density heatmap plots useful for? -- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a scatter plot. +- **Scatter Plot Replacement**: When dealing with a large number of data points, density heatmaps provide a more concise, informative and performant visualization than a [scatter plot](scatter.md). - **2D Density Estimation**: Density heatmaps can serve as the basis for 2D density estimation methods, helping to model and understand underlying data distributions, which is crucial in statistical analysis and machine learning. - **Metric Assessment**: By aggregating data points within each cell, density heatmaps can provide insights into the distribution of a specific metric or value across different regions, highlighting groups for further analysis. @@ -16,29 +14,27 @@ Density heatmap plots are appropriate when the data contains two continuous vari ### A basic density heatmap -Visualize the counts of data points between two continuous variables within a grid. This could possibly replace a scatter plot when there are too many data points to be easily visualized. +Visualize the joint distribution of two variables by passing each column name to the `x` and `y` arguments. ```python order=heatmap,iris import deephaven.plot.express as dx iris = dx.data.iris() -# Create a basic density heatmap by specifying columns for the `x` and `y` axes -heatmap = dx.density_heatmap(iris, x="petal_length", y="petal_width") +heatmap = dx.density_heatmap(iris, x="PetalLength", y="PetalWidth") ``` ### A density heatmap with a custom color scale -Visualize the counts of data points between two continuous variables within a grid with a custom color scale. +Custom color scales can be provided to the `color_continuous_scale` argument, and their range can be defined with the `range_color` argument. ```py order=heatmap_colorscale,iris import deephaven.plot.express as dx -iris = dx.data.iris() # Import a ticking version of the Iris dataset +iris = dx.data.iris() -# Color the heatmap using the "viridis" color scale with a range from 5 to 8 -heatmap_colorscale = dx.density_heatmap( - iris, - x="petal_length", - y="petal_width", +# use the "viridis" color scale with a range from 5 to 8 +heatmap_colorscale = dx.density_heatmap(iris, + x="PetalLength", + y="PetalWidth", color_continuous_scale="viridis", range_color=[5, 8] ) @@ -46,18 +42,18 @@ heatmap_colorscale = dx.density_heatmap( ### A density heatmap with a custom grid size and range -Visualize the counts of data points between two continuous variables within a grid with a custom grid size and range. The number of bins significantly impacts the visualization by changing the granularity of the grid. +The number of bins on each axis can be set using the `nbinsx` and `nbinsy` arguments. The number of bins significantly impacts the visualization by changing the granularity of the grid. ```py order=heatmap_bins,iris import deephaven.plot.express as dx -iris = dx.data.iris() # import a ticking version of the Iris dataset +iris = dx.data.iris() # Create a density heatmap with 20 bins on each axis and a range from 3 to the maximum value for the x-axis. # None is used to specify an upper bound of the maximum value. heatmap_bins = dx.density_heatmap( iris, - x="petal_length", - y="petal_width", + x="PetalLength", + y="PetalWidth", nbinsx=20, nbinsy=20, range_bins_x=[3, None], @@ -66,23 +62,48 @@ heatmap_bins = dx.density_heatmap( ### A density heatmap with a custom aggregation function -Visualize the average of a third dependent continuous variable across the grid. Histfuncs can only be used when three columns are provided. Possible histfuncs are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`. +Use an additional continuous variable to color the heatmap. Many statistical aggregations can be computed on this column by providing the `histfunc` argument. Possible values for the `histfunc` are `"abs_sum"`, `"avg"`, `"count"`, `"count_distinct"`, `"max"`, `"median"`, `"min"`, `"std"`, `"sum"`, and `"var"`. ```py order=heatmap_aggregation,iris - import deephaven.plot.express as dx -iris = dx.data.iris() # import a ticking version of the Iris dataset +iris = dx.data.iris() -# Create a density heatmap with an average aggregation function. -heatmap_aggregation = dx.density_heatmap( - iris, - x="petal_length", - y="petal_width", - z="sepal_length", +# color the map by the average of an additional continuous variable +heatmap_aggregation = dx.density_heatmap(iris, + x="PetalLength", + y="PetalWidth", + z="SepalLength", histfunc="avg" ) ``` +### Large datasets + +Visualize the joint distribution of a large dataset (10 million rows in this example) by passing each column name to the `x` and `y` arguments. Increasing the number of bins can produce a much smoother visualization. + +```python order=large_heatmap_2,large_heatmap_1,large_data +from deephaven.plot import express as dx +from deephaven import empty_table + +large_data = empty_table(10_000_000).update([ + "X = 50 + 25 * cos(i * Math.PI / 180)", + "Y = 50 + 25 * sin(i * Math.PI / 180)", +]) + +# specify range to see entire plot +large_heatmap_1 = dx.density_heatmap(large_data, x="X", y="Y", range_bins_x=[0,100], range_bins_y=[0,100]) + +# using bins may be useful for more precise visualizations +large_heatmap_2 = dx.density_heatmap( + large_data, + x="X", + y="Y", + range_bins_x=[0,100], + range_bins_y=[0,100], + nbinsx=100, + nbinsy=100 +) +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/ecdf.md b/plugins/plotly-express/docs/ecdf.md deleted file mode 100644 index 7bfd4e9bd..000000000 --- a/plugins/plotly-express/docs/ecdf.md +++ /dev/null @@ -1,14 +0,0 @@ -# ECDF Plot - -An Empirical Cumulative Distribution Function (ECDF) plot is a non-parametric statistical tool used to visualize the distribution of data. It displays the cumulative proportion of data points that are less than or equal to a given value, providing insights into data spread and characteristics without making assumptions about the underlying probability distribution. - -Interpreting an Empirical Cumulative Distribution Function (ECDF) involves examining its curve, which represents the cumulative proportion of data points below a given value. The steepness of the ECDF curve at a particular point indicates the density of data points at that value. Steeper segments imply higher density, while flatter segments suggest lower density. Comparing an ECDF to a histogram of the same data can reveal the correspondence between continuous and discrete representations of the data distribution. The ECDF provides a more detailed view of the data's distribution, whereas a histogram simplifies the distribution into discrete bins, making it easier to identify data patterns, central tendencies, and outliers. - -Empirical Cumulative Distribution Function (ECDF) are useful for: - -1. **Distribution Visualization**: When you want to visualize the distribution of a dataset and understand the cumulative behavior of data points. -2. **Comparison of Datasets**: For comparing the distributions of multiple datasets or variables, particularly when assessing how they differ or overlap. -3. **Outlier Identification**: To identify potential outliers and extreme values within the dataset, as they often stand out in the ECDF plot. -4. **Hypothesis Testing**: When conducting hypothesis tests, an ECDF can help assess whether the observed data conforms to a specific theoretical distribution or model, aiding in statistical analysis and decision-making. - -## Examples diff --git a/plugins/plotly-express/docs/funnel-area.md b/plugins/plotly-express/docs/funnel-area.md index 38709c57b..be01c21e6 100644 --- a/plugins/plotly-express/docs/funnel-area.md +++ b/plugins/plotly-express/docs/funnel-area.md @@ -1,16 +1,31 @@ # Funnel Area Plot -A funnel area plot is a data visualization that is typically used to represent data where values progressively decrease or "funnel" through stages or categories. It takes the form of a series of horizontally aligned trapezoids or polygons, with each stage's area proportional to the quantity it represents, making it a useful tool for visualizing the attrition or progression of data through a sequential process. +A funnel area plot is a data visualization that is typically used to represent data where values progressively decrease or "funnel" through stages or categories. It takes the form of a series of horizontally aligned trapezoids or polygons, with each stage's area proportional to the quantity it represents, making it a useful tool for visualizing the attrition or progression of data through a sequential process. The data must be ordered by the response variable, or the "funnel" shape will not be guaranteed. -Funnel area plots are useful for: +Funnel area plots differ from [funnel plots](funnel.md) in that they display the percentage of data points that belong to each category, while funnel plots display the absolute count of data points in each category. Funnel area plots also count each data point as belonging to _exactly one_ category and display the categories as mutually exclusive. On the other hand, funnel plots count each data point as belonging to _at least one_ category, so the categories are represented as subsets of each other rather than mutually exclusive. -1. **Sequential Data**: When visualizing data that follows a sequential or staged progression. -2. **Progression Analysis**: For analyzing attrition, conversion rates, or transitions between stages. -3. **Efficiency Evaluation**: To assess the efficiency and effectiveness of a process or workflow. -4. **Data Funneling**: When representing data where values funnel through stages or categories. +Funnel area plots are appropriate when the data contain a categorical variable where the frequencies of each category can be computed, and the categories can be ordered. Additionally, funnel plots assume a particular relationship between levels of the categorical variable, where each category is a proper subset of the previous category. If the data contain an unordered categorical variable, or the categories are better conceptualized as parts of a whole, consider a pie plot instead of a funnel area plot. + +### What are funnel area plots useful for? + +- **Visualizing sequential data**: Data that are staged or sequential in some way are often visualized with funnel area plots, yielding insight on the rate of change from one stage to the next. +- **Analyzing data progression**: Funnel area plots may be used for analyzing attrition, conversion rates, or transitions between stages. +- **Evaluating efficiency**: Assessing the efficiency and effectiveness of a process or workflow is easy with funnel area plots. ## Examples +### A basic funnel plot + +Visualize the trend in consecutive stages of a categorical variable by passing column names to the `names` and `values` arguments. + +```python order=funnel_area_plot,marketing +import deephaven.plot.express as dx +marketing = dx.data.marketing() + +# `Count` is the frequency/value column, and `Stage` is the category column +funnel_area_plot = dx.funnel_area(marketing, names="Stage", values="Count") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.funnel_area diff --git a/plugins/plotly-express/docs/funnel.md b/plugins/plotly-express/docs/funnel.md index 523d431f0..782f8c6a4 100644 --- a/plugins/plotly-express/docs/funnel.md +++ b/plugins/plotly-express/docs/funnel.md @@ -1,16 +1,32 @@ # Funnel Plot -A funnel plot is a data visualization that represents a process with various stages and allows multiple stacked categories, showing the quantitative values or counts at each stage in a funnel shape. It is a useful tool for tracking the progression or attrition of data through different stages, providing a visual overview of data distribution within the process. A funnel area plot, on the other hand, is another visualization that represents data progressing through stages, but it uses filled polygons to depict the proportional quantity of data at each stage, making it a valuable tool for comparing the relative size of categories within a process but can only represent one category. +A funnel plot is a data visualization that represents a process with various stages and allows multiple stacked categories, showing the quantitative values or counts at each stage in a funnel shape. It is a useful tool for tracking the progression or attrition of data through different stages, providing a visual overview of data distribution within the process. The data must be ordered by the response variable, or the "funnel" shape will not be guaranteed. -Funnel plots are useful for: +Funnel plots differ from [funnel area plots](funnel-area.md) in that they display the absolute count of data points in each category, while funnel area plots display the percentage of data points that belong to each category. Funnel plots also count each data point as belonging to _at least one_ category, so the categories are represented as subsets of each other. On the other hand, funnel area plots also count each data point as belonging to _exactly one_ category, and display the categories as mutually exclusive. -1. **Progression Analysis**: When you need to analyze the progression, attrition, or conversion rates of data as it moves through different stages and by categories. -2. **Sequential Processes**: Funnel plots are suitable for visualizing data within sequential processes, where data typically funnels through various stages. -3. **Data Distribution**: When you want to gain insights into the distribution of data at each stage within a process, and you can represent multiple categories as stacked bars for comparative analysis. -4. **Efficiency Assessment**: To assess the efficiency and effectiveness of a process, particularly when evaluating the attrition or conversion of elements at each stage. +Funnel plots are appropriate when the data contain a categorical variable where the frequencies of each category can be computed, and the categories can be ordered. Additionally, funnel plots assume a particular relationship between levels of the categorical variable, where each category is a proper subset of the previous category. If the data contain an unordered categorical variable, or the categories are better conceptualized as parts of a whole, consider a pie plot instead of a funnel plot. +### What are funnel plots useful for? + +- **Visualizing sequential data**: Data that are staged or sequential in some way are often visualized with funnel plots, yielding insight on the absolute changes between each stage. +- **Comparing categories**: Funnel plots can be broken down into categories to produce insights into the distribution of data at each stage within a process. Then +- **Evaluating efficiency**: Assessing the efficiency and effectiveness of a process or workflow, particularly when evaluating the attrition or conversion at each stage, is easy with funnel plots. + +## Examples + +### A basic funnel plot + +Visualize the trend in consecutive stages of a categorical variable by passing column names to the `x` and `y` arguments. + +```python order=funnel_plot,marketing +import deephaven.plot.express as dx +marketing = dx.data.marketing() + +# `Count` is the frequency/value column, and `Stage` is the category column +funnel_plot = dx.funnel(marketing, x="Count", y="Stage") +``` ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.funnel -``` \ No newline at end of file +``` diff --git a/plugins/plotly-express/docs/histogram.md b/plugins/plotly-express/docs/histogram.md index fedd71d69..83f60eb18 100644 --- a/plugins/plotly-express/docs/histogram.md +++ b/plugins/plotly-express/docs/histogram.md @@ -1,16 +1,67 @@ # Histogram Plot -A histogram plot is a data visualization technique commonly used in statistics and data analysis to represent the distribution of a dataset. It consists of a series of contiguous, non-overlapping bars that provide a visual summary of the frequency or density of data points within predefined intervals or "bins." The number of bins used has a significant impact on the vizualization, and this number currently must be set manually. +A histogram plot is a data visualization technique commonly used in statistics and data analysis to visualize the distribution of a single continuous variable. It consists of a series of contiguous, non-overlapping bars that provide a visual summary of the frequency or density of data points within predefined intervals or "bins." The number of bins significantly impacts the visualization. -Histogram plots are useful for: +Histograms are appropriate when the data contain a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, layered histograms may be appropriate using the `by` argument. -1. **Data Distribution Analysis**: Histograms are a valuable tool to gain insights into the distribution of a dataset, making it easier to understand the central tendencies, spread, and skewness of the data. -2. **Identifying Outliers**: Histograms help in detecting outliers or anomalies in a dataset by highlighting data points that fall outside the typical distribution. -3. **Quantitative Comparison**: When comparing the distribution of multiple datasets or subsets, histograms provide a straightforward visual means of assessing differences in data patterns. -4. **Density Estimation**: Histograms can serve as the basis for density estimation methods, helping to model and understand underlying data distributions, which is crucial in statistical analysis and machine learning. +### What are histograms useful for? + +- **Data distribution analysis**: Histograms are a valuable tool to gain insights into the distribution of a dataset, making it easier to understand the central tendencies, spread, and skewness of the data. +- **Identifying outliers**: Histograms help in detecting outliers or anomalies in a dataset by highlighting data points that fall outside the typical distribution. +- **Density estimation**: Histograms can serve as the basis for density estimation methods, helping to model and understand underlying data distributions, which is crucial in statistical analysis and machine learning. ## Examples +### A basic histogram + +Visualize the distribution of a single variable by passing the column name to the `x` or `y` arguments. + +```python order=hist_plot_x,hist_plot_y,setosa,iris +import deephaven.plot.express as dx +iris = dx.data.iris() + +# subset to get specific species +setosa = iris.where("Species == `setosa`") + +# control the plot orientation using `x` or `y` +hist_plot_x = dx.histogram(setosa, x="SepalLength") +hist_plot_y = dx.histogram(setosa, y="SepalLength") +``` + +Modify the bin size by setting `nbins` equal to the number of desired bins. + +```python order=hist_20_bins,hist_3_bins,hist_8_bins,virginica,iris +import deephaven.plot.express as dx +iris = dx.data.iris() + +# subset to get specific species +virginica = iris.where("Species == `virginica`") + +# too many bins will produce jagged, disconnected histograms +hist_20_bins = dx.histogram(setosa, x="SepalLength", nbins=20) + +# too few bins will mask distributional information +hist_3_bins = dx.histogram(setosa, x="SepalLength", nbins=3) + +# play with the `nbins` parameter to get a good visualization +hist_8_bins = dx.histogram(setosa, x="SepalLength", nbins=8) +``` + +### Distributions of several groups + +Histograms can also be used to compare the distributional properties of different groups of data, though they may be a little harder to read than [box plots](box.md) or [violin plots](violin.md). Pass the name of the grouping column(s) to the `by` argument. + +```python order=stacked_hist,overlay_hist,iris +import deephaven.plot.express as dx +iris = dx.data.iris() + +# each bin may be stacked side-by-side for each group +stacked_hist = dx.histogram(iris, x="SepalLength", by="Species") + +# or, each bin may be overlaid with the others +overlay_hist = dx.histogram(iris, x="SepalLength", by="Species", barmode="overlay") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.histogram diff --git a/plugins/plotly-express/docs/icicle.md b/plugins/plotly-express/docs/icicle.md index 916de4819..ed17e8c69 100644 --- a/plugins/plotly-express/docs/icicle.md +++ b/plugins/plotly-express/docs/icicle.md @@ -2,15 +2,36 @@ Icicle plots, a hierarchical data visualization technique, are used to represent structured data with nested categories or levels. They are characterized by a rectangular layout where each column represents a level of the hierarchy, and the width of each subcolumn is proportional to the quantity of data within its respective category, facilitating the visualization of data structure and distribution. -Icicle plots are useful for: +Icicle plots are appropriate when the data have a hierarchical structure. Each level of the hierarchy consists of a categorical variable and an associated numeric variable with a value for each unique category. -1. **Hierarchical Data Representation**: Icicle charts are particularly useful for visualizing hierarchical data, such as organizational structures, file directories, or nested categorical data. They provide a clear and intuitive way to represent multiple levels of hierarchy in a single view. -2. **Space-efficient Representation**: By using a compact rectangular layout, icicle charts make efficient use of space. This allows for the display of large and complex hierarchies without requiring extensive scrolling or panning, making it easier to analyze and interpret the data at a glance. -3. **Interactive Exploration**: Icicle charts often come with interactive features that allow users to drill down into specific branches of the hierarchy. This interactivity enables detailed exploration and analysis of sub-categories, aiding in uncovering insights and patterns within the data. -4. **Comparative Analysis**: The consistent and proportional layout of icicle charts makes them effective for comparing the size and structure of different branches within the hierarchy. Users can easily identify and compare the relative importance or size of various categories, facilitating better decision-making and resource allocation. +### What are icicle plots useful for? + +- **Representing hierarchical data**: Icicle charts are particularly useful for visualizing hierarchical data, such as organizational structures, file directories, or nested categorical data. They provide a clear and intuitive way to represent multiple levels of hierarchy in a single view. +- **Space-efficient plotting**: By using a compact rectangular layout, icicle charts make efficient use of space. This allows for the display of large and complex hierarchies without requiring extensive scrolling or panning, making it easier to analyze and interpret the data at a glance. +- **Comparative analysis**: The consistent and proportional layout of icicle charts makes them effective for comparing the size and structure of different branches within the hierarchy. Users can easily identify and compare the relative importance or size of various categories, facilitating better decision-making and resource allocation. ## Examples +### A basic icicle plot + +Visualize a hierarchical dataset as nested rectangles, with categories displayed left-to-right, and the size of each category displayed top-to-bottom. Use the `names` argument to specify the column name for each group's labels, the `values` argument to specify the column name for each group's values, and the `parents` column to specify the root category of the chart. + +```python order=icicle_plot,gapminder_recent,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# create table of only the most recent year of data, compute total population for each continent +gapminder_recent = ( + gapminder + .last_by("Country") + .view(["Continent", "Pop"]) + .sum_by("Continent") + .update("World = `World`") +) + +icicle_plot = dx.icicle(gapminder_recent, names="Continent", values="Pop", parents="World") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.icicle diff --git a/plugins/plotly-express/docs/layer-plots.md b/plugins/plotly-express/docs/layer-plots.md index b0969d4e9..47993e1a8 100644 --- a/plugins/plotly-express/docs/layer-plots.md +++ b/plugins/plotly-express/docs/layer-plots.md @@ -1,11 +1,43 @@ # Layer plots -To "layer" or "stack" multiple plots on top of each other, use the `layer` function. This is useful if you want to combine multiple plots of different types into a single plot, such as a scatter plot and a line plot. By default, last plot given will be used for the layout. The `which_layout` parameter can be used to specify which plot's layout should be used. The `specs` parameter can be used to specify the domains of each plot. +To "layer" or "stack" multiple plots on top of each other, use the `layer` function. This is useful if you want to combine multiple plots of different types into a single visualization, such as a [scatter plot](scatter.md) and a [line plot](line.md). This is distinct from [sub-plots](sub-plots.md), which present multiple plots side-by-side. By default, the stacked plot will use the layout (axis labels, axis ranges, and title) from the last plot in the sequence. The `which_layout` parameter can be used to specify which plot's layout should be used. ## Examples +### Layering two plots + +Use a [candlestick plot](candlestick.md) and a [line plot](line.md) for two different perspectives on the same data. + +```python order=financial_plot,dog_prices,dog_ohlc,stocks +import deephaven.plot.express as dx +import deephaven.agg as agg +stocks = dx.data.stocks() # import the example stock market data set + +# select only DOG prices and compute ohlc +dog_prices = stocks.where("Sym == `DOG`") +dog_ohlc = dog_prices.update_view( + "BinnedTimestamp = lowerBin(Timestamp, 'PT1m')" +).agg_by( + [ + agg.first("Open=Price"), + agg.max_("High=Price"), + agg.min_("Low=Price"), + agg.last("Close=Price"), + ], + by="BinnedTimestamp", +) + +# layer a line plot and a candlestick plot by passing both to layer() +financial_plot = dx.layer( + dx.line( + dog_prices, x="Timestamp", y="Price"), + dx.candlestick( + dog_ohlc, x="BinnedTimestamp", + open="Open", high="High", low="Low", close="Close") +) +``` ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.layer -``` \ No newline at end of file +``` diff --git a/plugins/plotly-express/docs/line-3d.md b/plugins/plotly-express/docs/line-3d.md index 9177fa7c6..dd0684dfe 100644 --- a/plugins/plotly-express/docs/line-3d.md +++ b/plugins/plotly-express/docs/line-3d.md @@ -1,18 +1,37 @@ # 3D Line Plot -3D line plots are a data visualization technique that displays data points as connected line segments in a three-dimensional space. They are used to visualize and analyze data that has dependencies on three variables, facilitating the exploration of patterns, trends, and relationships within the data. +3D line plots are a data visualization technique that displays data points as connected line segments in a three-dimensional space. They are used to visualize and continuous variables that depend on two continuous independent variables, facilitating the exploration of patterns, trends, and relationships within the data. -3D line plots are useful for: +3D line plots are appropriate when a continuous response variable depends on two continuous explanatory variables. If there is an additional categorical variable that the response variable depends on, shapes or colors can be used in the scatter plot to distinguish the categories. Further, line plots are preferable to scatter plots when the explanatory variables are ordered. -1. **Multidimensional Data Visualization**: They allow for the representation of data dependent on three variables, providing a more comprehensive view of complex relationships. -2. **Trend Exploration**: 3D line plots are useful for exploring and understanding trends, patterns, and variations in data within a 3D space, making them valuable in scientific and engineering fields. -3. **Data Interaction**: They enable the visualization of data interactions within three-dimensional datasets, aiding in the analysis of data dependencies and correlations. +### What are 3D line plots useful for? + +- **Multidimensional data visualization**: 3D line plots allow for the representation of data in a 3D space, providing a more comprehensive view of complex relationships. +- **Trend exploration**: 3D line plots are useful for exploring and understanding trends, patterns, and variations in data within a 3D space, making them valuable in scientific and engineering fields. +- **Data interaction**: They enable the visualization of data interactions within 3D datasets, aiding in the analysis of data dependencies and correlations. Alternatives to 3D line plots include: -- **Scatter Plots with Color or Size Mapping**: These can be used to represent three variables with the addition of color or size mapping to signify the third dimension. -- **Surface Plots**: When visualizing continuous data over a 3D space, surface plots may be more appropriate, as they create a continuous surface representation. +- **[Scatter Plots](scatter.md) with Color or Size Mapping**: These can be used to represent three variables with the addition of color or size mapping to signify the third dimension. +- **[Density Heatmaps](density_heatmap.md)**: When visualizing continuous data over a 3D space, density heatmaps may be more appropriate, as they create a continuous surface representation. + +## Examples + +### A basic 3D line plot + +Visualize the relationship between three variables by passing their column names to the `x`, `y`, and `z` arguments. Click and drag on the resulting chart to rotate it for new perspectives. + +```python order=line_plot_3D,spiral +import deephaven.plot.express as dx +from deephaven import time_table + +# create a simple spiral dataset +spiral = time_table("PT0.01s").update_view( + ["X = sin(ii / 100)", "Y = cos(ii / 100)", "Z = 4 * ii / 100"] +) +line_plot_3D = dx.line_3d(spiral, x="X", y="Y", z="Z") +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/line-polar.md b/plugins/plotly-express/docs/line-polar.md index 353db5eff..bae9e7886 100644 --- a/plugins/plotly-express/docs/line-polar.md +++ b/plugins/plotly-express/docs/line-polar.md @@ -2,15 +2,28 @@ Polar line plots are a type of data visualization that represents data points on a polar coordinate system. They display data as connected line segments extending from the center of a circular plot, often used to illustrate relationships, trends, or patterns within data that have angular or periodic dependencies. -Polar line plots are useful for: +Polar line plots are appropriate when the data contain a continuous variable represented in polar coordinates, with a radial and an angular component instead of the typical x and y components. Further, polar line plots are preferable to [polar scatter plots](scatter-polar.md) when the explanatory variables are ordered. -1. **Cyclical Data Analysis**: They are ideal for analyzing cyclical or periodic data, such as daily temperature fluctuations, seasonal patterns, or circular processes in physics and engineering. -2. **Directional Data Representation**: Polar line plots are valuable for representing directional data, such as wind direction, compass bearings, or circular measurements, offering a clear way to visualize and analyze patterns. -3. **Phase or Angular Relationships**: When assessing phase shifts, angular dependencies, or correlations in data, polar line plots provide an intuitive representation for understanding relationships within circular data. -4. **Circular Data Exploration**: They can be used to explore and analyze data where the angular or periodic nature of the data is a significant aspect, making them useful in fields like meteorology, geophysics, and biology. +### What are polar line plots useful for? + +- **Cyclical data analysis**: They are ideal for analyzing cyclical or periodic data, such as daily temperature fluctuations, seasonal patterns, or circular processes in physics and engineering. +- **Representing directional data**: Polar line plots are valuable for representing directional data, such as wind direction, compass bearings, or circular measurements, offering a clear way to visualize and analyze these kinds of patterns. +- **Phase or angular relationships**: When assessing phase shifts, angular dependencies, or correlations in data, polar line plots provide an intuitive representation for understanding relationships within circular data. ## Examples +### A basic polar line plot + +Visualize a dataset in polar coordinates by passing column names to the `r` and `theta` arguments. `theta` may be a string of cardinal directions, as in this example. `theta` also supports the use of numeric types that may represent radians or degrees, depending on how the `range_theta` argument is supplied. + +```python order=polar_line_plot,wind +import deephaven.plot.express as dx +wind = dx.data.wind() + +# `by` is used to separate data by groups +polar_line_plot = dx.line_polar(wind, r="Frequency", theta="Direction", by="Strength") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.line_polar diff --git a/plugins/plotly-express/docs/line-ternary.md b/plugins/plotly-express/docs/line-ternary.md index 5a0dea2d0..b426c004e 100644 --- a/plugins/plotly-express/docs/line-ternary.md +++ b/plugins/plotly-express/docs/line-ternary.md @@ -2,12 +2,26 @@ Ternary line plots are a data visualization technique that represents data in a triangular coordinate system. They display data as connected line segments within the triangular space, making them useful for visualizing relationships, trends, and compositional data that sum to a constant total. Ternary line plots are particularly valuable when dealing with data involving three mutually exclusive components or proportions. -Ternary line plots are useful for: +Ternary line plots are appropriate when the data contain three interrelated mutually exclusive categories whose relationships can be quantified with a continuous variable. Further, ternary line plots are preferable to [ternary scatter plots](scatter-ternary.md) when the explanatory variables are ordered. -1. **Compositional Data Representation**: Ternary line plots are suitable for representing compositional data where the total proportion remains constant, allowing for the visualization of how components change relative to one another. -2. **Multivariate Data Analysis**: They are useful in multivariate data analysis to visualize relationships and trends among three variables or components that are interrelated. -3. **Optimization Studies**: Ternary line plots can be applied in optimization studies to understand how adjustments in the proportions of three components impact the overall composition, aiding in informed decision-making. +### What are ternary line plots useful for? +- **Compositional data representation**: Ternary line plots are suitable for representing compositional data where the total proportion remains constant, allowing for the visualization of how components change relative to one another. +- **Multivariate data analysis**: They are useful in multivariate data analysis to visualize relationships and trends among three variables or components that are interrelated. +- **Optimization studies**: They can be applied in optimization studies to understand how adjustments in the proportions of three components impact the overall composition, aiding in informed decision-making. + +## Examples + +### A basic ternary line plot + +Visualize a ternary dataset by passing column names to each of the `a`, `b`, and `c` arguments. + +```python order=ternary_line_plot,election +import deephaven.plot.express as dx +election = dx.data.election() + +ternary_line_plot = dx.line_ternary(election, a="Joly", b="Coderre", c="Bergeron") +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/line.md b/plugins/plotly-express/docs/line.md index 8396f11d4..c213c1311 100644 --- a/plugins/plotly-express/docs/line.md +++ b/plugins/plotly-express/docs/line.md @@ -2,398 +2,40 @@ A line plot is a graphical representation that displays data points connected by straight lines, commonly employed in time series analysis to depict temporal trends or relationships in a dataset. -Here are some reasons why you might choose to use a line plot over other types of plots: +Line plots are appropriate when the data contain a continuous response variable that directly depends on a continuous explanatory variable. Further, line plots are preferable to [scatter plots](scatter.md) when the explanatory variables are ordered. -1. **Visualizing Trends:** Line plots excel at revealing trends and patterns in data, making them ideal for time series analysis and showcasing changes over a continuous range. -2. **Connecting Data Points:** They effectively connect data points with straight lines, emphasizing the continuity and sequence of values, which is especially useful when dealing with ordered data. -3. **Simplicity and Clarity:** Line plots offer a straightforward and uncluttered representation, enhancing readability and allowing developers to focus on the data's inherent structure. -4. **Comparing Multiple Series:** Line plots make it easy to compare multiple data series on the same graph, aiding in the identification of similarities and differences. -5. **Highlighting Outliers:** Outliers or abrupt changes in data become readily apparent in line plots, aiding in the detection of anomalies or significant events. +### What are line plots useful for? + +- **Visualizing trends:** Line plots excel at revealing trends and patterns in data, making them ideal for time series analysis and showcasing changes over a continuous range. +- **Simplicity and clarity:** Line plots offer a straightforward and uncluttered representation, enhancing readability and allowing developers to focus on the data's inherent structure. +- **Comparing multiple series:** Line plots make it easy to compare multiple data series on the same graph, aiding in the identification of similarities and differences. ## Examples ### A basic line plot -Visualize the relationship between two variables. Column names are passed in directly as `x` and `y`. +Visualize the relationship between two variables by passing each column name to the `x` and `y` arguments. ```python order=line_plot,my_table import deephaven.plot.express as dx -my_table = dx.data.stocks() # import the example stock market data set -dog_prices = my_table.where("sym = `DOG`") - -# Create a basic line plot by specifying the x and y column -line_plot = dx.line(dog_prices, x="timestamp", y="price") -``` +my_table = dx.data.stocks() -### Color line plot by group +# subset data for just DOG transactions +dog_prices = my_table.where("Sym = `DOG`") -Plot values by group. The query engine performs a `partition_by` on the column provide to the color argument and assigns a unique color to each group. - -```python order=line_plot,mytable -import deephaven.plot.express as dx -my_table = dx.data.stocks() # import the example stock market data set - -# Assign unique colors to each grouping key in a column -line_plot = dx.line(my_table, x="timestamp", y="price", color="sym") +line_plot = dx.line(dog_prices, x="Timestamp", y="Price") ``` -### Color using custom discrete colors - -```python order=scatter_plot_color_sequence,scatter_plot_color_map,scatter_plot_color_column -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Set custom colors -scatter_plot_color_sequence = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # group colors by a column - color="species", - # A list of colors to sequentially apply to one or more series - # The colors loop if there are more series than colors - color_discrete_sequence=["salmon", "#fffacd", "rgb(100,149,237)"] -) - -# Ex 2. Set trace colors from a map of colors -scatter_plot_color_map = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # group colors by a column - color="species", - # set each series to a specific color - color_discrete_map={"virginica":"lemonchiffon", "setosa": "cornflowerblue", "versicolor":"#FA8173"} -) - -# Ex 3. Set colors using values from a column -# Generate a column of valid CSS colors to use as an example -table_with_column_of_colors = my_table.update( - "example_colors = `rgb(` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) +`)`" -) - -scatter_plot_color_column = dx.scatter( - table_with_column_of_colors, - x="sepal_width", - y="sepal_length", - color="example_colors", - # When set to `identity`, the column data passed to the - # color parameter will used as the actual color - color_discrete_map="identity", -) -``` - -### Symbols by group - -Symbols can be statically assigned, assigned to a group as part of a `partition_by` operation drawing from a sequence, or from a map. See the symbol list for all available symbols. - - - -```python order=scatter_plot_diamonds,scatter_plot_symbol_by,scatter_plot_symbol_map -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Assign a custom symbol -scatter_plot_diamonds = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # See list of available symbols. - symbol_sequence=["diamond"] -) - -# Ex 2. Use symbols to differentiate groups -scatter_plot_symbol_by = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - color="species", - # Assign symbols by group, shown using default symbol_sequence - symbol="species" -) - -# Ex 3. Use a map to assign symbols to groups -scatter_plot_symbol_map = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - color="species", - # Using a map for symbols by value - symbol="species", - symbol_map={"setosa":"cross", "versicolor":"pentagon", "virginica":"star"} -) -``` - -### Error Bars - -Error bars can be set on x and/or y, using values from a column. - -```python order=scatter_plot_error,scatter_plot_error_minus -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Use values from a column as positive and negative error bars -scatter_plot_error = dx.scatter( - my_table.update("error_sepal_width = sepal_width * 0.01"), - x="sepal_width", - y="sepal_length", - error_x="error_sepal_width", -) - -#Ex 2. Use values from two columns for y-positive-error and y-negative-error -scatter_plot_error_minus = dx.scatter( - my_table.update( - [ - # let's pretend these columns represent error - "error_sepal_length_positive = petal_width * 0.25", - "error_sepal_length_negative = petal_length * 0.25", - ] - ), - x="sepal_width", - y="sepal_length", - # will be use as positive and negative error unless _minus is set - error_y="error_sepal_length_positive", - error_y_minus="error_sepal_length_negative", -) -``` - -### Labels and Hover Text - - - -```python order=scatter_plot_title,scatter_plot_axes_titles -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Label axes using a map -scatter_plot_title = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # Adds a title label, title supports a subset of html and css - title="Iris Scatter Plot", - # re-label the axis - labels={"sepal_width": "Sepal Width", "sepal_length": "Sepal Length"}, - # adds values from a column as bolded text to hover tooltip - hover_name="species" -) - -# Ex 2. Label multiple axes using an array of strings -scatter_plot_axes_titles = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - xaxis_titles=["Sepal Width"], - yaxis_titles=["Sepal Length"], -) -``` - -### Marginals - -Plot marginals are additional visual representations, like histograms or density plots, displayed alongside the main plot to provide insights into the individual distributions of variables being analyzed. They enhance the understanding of data patterns and trends by showing the univariate distribution of each variable in conjunction with the main plot's visualization. - -```python order=scatter_marginal_histogram,scatter_marginal_violin,scatter_marginal_rug,scatter_marginal_box -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +### Line by group -# Ex 1. Histogram style marginals -scatter_marginal_histogram = dx.scatter( - my_table, - x="petal_width", - y="petal_length", - marginal_x="histogram", - marginal_y="histogram", -) +Create a line with a unique color for each group in the dataset by passing the grouping column name to the `by` argument. -# Ex 2. Violin style marginals -scatter_marginal_violin = dx.scatter( - my_table, - x="petal_width", - y="petal_length", - marginal_x="violin", - marginal_y="violin", -) - -# Ex 3. Rug style marginals -scatter_marginal_rug = dx.scatter( - my_table, - x="petal_width", - y="petal_length", - marginal_x="rug", - marginal_y="rug", -) - -# Ex 4. Box style marginals -scatter_marginal_box = dx.scatter( - my_table, - x="petal_width", - y="petal_length", - marginal_x="box", - marginal_y="box", -) -``` - -### Log Axes - -```python order=scatter_plot_log -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -scatter_plot_axes_titles = dx.scatter( - my_table, - x="petal_width", - # Each y value becomes a seperate series - y="petal_length", - log_x=True, - log_y=True, -) -``` - -### Axes Range - -```python order=scatter_plot_range -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -scatter_plot_range = dx.scatter( - my_table, - x="petal_width", - # Each y value becomes a seperate series - y="petal_length", - # Set at custom range for each axes - range_x=[0,5], - range_y=[0,10], -) -``` - -### Multiple Axes - -You can create multiple axes on a single graph in a number of different ways depending on what you are trying to do. Axes can be created from columns, or by value from a column, of from multiple plots layered together. - -```python order=scatter_plot_title,scatter_plot_axes_titles -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Create multiple axes from mulitple columns -scatter_plot_axes_titles = dx.scatter( - my_table, - x="sepal_width", - # Each y value becomes a seperate series - y=["sepal_length", "petal_length"], - # position each axis for each series - yaxis_sequence=[1, 2], - # Label the axes - xaxis_titles=["Sepal Width"], - yaxis_titles=["Sepal Length", "Petal Length"], -) - - -# Ex 2. Create multiple axes by values from a column -stocks_table = dx.data.stocks().where("sym in `DOG`, `CAT`") - -scatter_stocks = dx.scatter( - stocks_table, - x="timestamp", - y="price", - # Partition color by sym - color="sym", - # Apply each trace to a different axis - yaxis_sequence=[1, 2], - # Label each axis, where order is by first appearence in the data - yaxis_titles=["CAT", "DOG"], -) - -#Ex 3. Create multiple axes from multiple tables using layers -layered_table = dx.data.iris() # import the example iris data set - -# split into two tables by species -table_setosa = layered_table.where("species = `setosa`") -table_versicolor = layered_table.where("species = `versicolor`") - -# layer two plots together, layout is inherited from the last table in the layer -layered_scatter = dx.layer( - # scatter plot from table 1 - dx.scatter( - table_setosa, - x="petal_width", - y="petal_length", - color_discrete_sequence=["salmon"], - ), - # scatter from table 2 - dx.scatter( - table_versicolor, - x="petal_width", - y="petal_length", - color_discrete_sequence=["lemonchiffon"], - # place this trace on a secondary axis - yaxis_sequence=[2], - # set the titles for both axes, as layer inherits from this layout - yaxis_titles=["versicolor petal_length","setosa petal_length"] - ) -) -``` - -### Layer as Event Markers - -Combines a line plot and a scatter plot to use as event markers indicating the maximum peak in each series. - -```python order=scatter_as_markers,marker_table -import deephaven.plot.express as dx - -my_table = dx.data.iris() # import the example iris data set -# find the max peaks of each series to use as our example markers -marker_table = my_table.select(["species", "petal_length", "timestamp"]).join( - my_table.select(["species", "petal_length"]).max_by("species"), - on=["species", "petal_length"], -) - -# layer as scatter on a line plot -scatter_as_markers = dx.layer( - # create a scatter plot to use as markers - dx.scatter( - marker_table, - x="timestamp", - y="petal_length", - symbol_sequence=["x"], - size_sequence=[15], - hover_name="species", - color_discrete_sequence=["#FFF"], - ), - # layer it with a line plot - dx.line( - my_table, - x="timestamp", - y="petal_length", - color="species", - ), -) -``` - -### Large Data Sets - -The default `render_mode` is webgl and can comfortably plot around 0.5 - 1 million points before performance of the browser will begin to degrade. In `render_mode=svg` that drops to around 10,000 points, but may offer more accurate rendering for some GPUs. - -In situations where scatter plots become impractical due to overlaping markers in large datasets, it is advisable to consider employing a Density Heatmap (2D Histogram) as an alternative visualization method. This approach allows for data binning through the query engine, enabling visualization of billions of data points, making it more suitable for handling such scenarios. Moreover, users may benefit from a clearer interpretation of the data using this method. - -For large, but managable datasets, setting an appropriate opacity can be beneficial as it helps address data overlap issuess, making the individual data points more distinguishable and enhancing overall visualization clarity. - - - -```python order=density_heatmap,scatter_plot_opacity +```python order=line_plot,mytable import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# TODO: Method doesn't exist yet -# Consider a 2d Histograms for large data sets -density_heatmap = dx.density_heatmap(my_table, x="sepal_width", y="sepal_length") +my_table = dx.data.stocks() -scatter_plot_opacity = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length" - # For data sets with a high degree of overlap between points, consider setting opacity - opacity=0.5 -) +# each line represents a group and has a unique color +line_plot = dx.line(my_table, x="Timestamp", y="Price", by="Sym") ``` ## API Reference diff --git a/plugins/plotly-express/docs/multiple-axes.md b/plugins/plotly-express/docs/multiple-axes.md index e2c0af6a2..0f660d98c 100644 --- a/plugins/plotly-express/docs/multiple-axes.md +++ b/plugins/plotly-express/docs/multiple-axes.md @@ -1,5 +1,66 @@ # Multiple Axes -You can create multiple x or y axes in a single plot in a few different ways, from columns or from paritions, or as layers from multiple plots. Passing multiple columns to the `x` or `y` parameters along with setting a `y_axis_sequence` or `x_axis_sequence` will create multiple axes. Using the `by` parameter along with an axis sequence can also create multiple axes, with one for each unique value in the column. The `layer` function can also be used to create multiple axes. +Create plots with multiple axes by specifying `xaxis_sequence` or `yaxis_sequence`. Multiple axis plots are useful for visualizing the relationship between variables that have very different units or scales. In these cases, multiple axes can help display their relationship without forcing one variable to conform to the scale of the other. ## Examples + +### Multiple columns + +When two or more response variables appear in separate columns, pass their column names to the `x` or `y` arguments. The resulting chart will have shared axes. +```python order=line_plot_shared,brazil,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# get a specific country +brazil = gapminder.where("Country == `Brazil`") + +# population and per capita gdp have very different scales and units +line_plot_shared = dx.line(brazil, x="Year", y=["Pop", "GdpPerCap"]) +``` + +The `xaxis_sequence` or `yaxis_sequence` arguments can be used to create multiple axes. + +```python order=line_plot_multi,brazil,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# get a specific country +brazil = gapminder.where("Country == `Brazil`") + +# specify multiple y-axis columns and split axes with yaxis_sequence +line_plot_multi = dx.line(brazil, x="Year", y=["Pop", "GdpPerCap"], yaxis_sequence=[1, 2]) +``` + +### Use `by` with multiple axes + +When a single response variable has observations from several groups of data, use the `by` parameter to specify the grouping column. + +```python order=line_plot_by,cat_dog,stocks +import deephaven.plot.express as dx +stocks = dx.data.stocks() + +# subset to get two symbols +cat_dog = stocks.where("Sym in `CAT`, `DOG`") + +# use `by` to specify the grouping column and order axes left to right with yaxis_sequence +line_plot_by = dx.line(cat_dog, x="Timestamp", y="Price", by="Sym", yaxis_sequence=[1, 2]) +``` + +### Layering + +Finally, plots can be layered to achieve multiple axes. Use the `dx.layer` function to accomplish this. + +```python order=line_plot_layered,fish,bird,stocks +import deephaven.plot.express as dx +stocks = dx.data.stocks() + +# subset to get two tables with a shared x-axis +fish = stocks.where("Sym == `FISH`") +bird = stocks.where("Sym == `BIRD`") + +# create multiple axes using dx.layer and specifying yaxis_sequence +line_plot_layered = dx.layer( + dx.line(fish, x="Timestamp", y="Price", yaxis_sequence=1), + dx.line(bird, x="Timestamp", y="Price", yaxis_sequence=2) +) +``` \ No newline at end of file diff --git a/plugins/plotly-express/docs/ohlc.md b/plugins/plotly-express/docs/ohlc.md index 46797b653..69b2f8d15 100644 --- a/plugins/plotly-express/docs/ohlc.md +++ b/plugins/plotly-express/docs/ohlc.md @@ -1,17 +1,49 @@ # OHLC Plot -OHLC (Open-High-Low-Close) plots, are a common data visualization tool used in finance to represent the price data of a financial instrument over a specific time frame. Similar to Candlesticks, they display four key prices: the opening price, the highest price (high), the lowest price (low), and the closing price, typically as vertical bars on a chart, providing insights into price movements and trends. +OHLC (Open-High-Low-Close) plots are a common data visualization tool used in finance to represent the price data of a financial instrument over a specific time frame. Similar to Candlesticks, they display four key prices: the opening price (open), the highest price (high), the lowest price (low), and the closing price (close), typically as vertical bars on a chart, providing insights into price movements and trends. In OHLC plots, each bar consists of a vertical line with small horizontal lines on both ends. The top of the vertical line represents the high price, the bottom represents the low price, the horizontal line on the left indicates the opening price, and the horizontal line on the right signifies the closing price. Additionally, the color of the bar is often used to indicate whether the closing price was higher (bullish, often green) or lower (bearish, often red) than the opening price, aiding in the quick assessment of price trends and market sentiment. Analyzing the shape, color, and position of these bars helps traders and analysts assess the price movement, trends, and market sentiment within a given time frame. -OHLC (Open-High-Low-Close) plots are useful for: +### What are OHLC plots useful for? -1. **Price Trend Analysis**: OHLC charts provide a clear visual representation of price trends and movements over specific time periods, helping traders and analysts assess market direction. -2. **Support and Resistance Identification**: They aid in identifying support and resistance levels, key price points that can inform trading decisions and risk management. -3. **Quantitative Analysis**: OHLC data can be leveraged for quantitative analysis, statistical modeling, and the development of trading strategies, making them valuable in algorithmic and systematic trading. +- **Price trend analysis**: OHLC charts provide a clear visual representation of price trends and movements over specific time periods, helping traders and analysts assess market direction. +- **Identifying support and resistance**: They aid in identifying support and resistance levels, key price points that can inform trading decisions and risk management. +- **Quantitative analysis**: OHLC data can be leveraged for quantitative analysis, statistical modeling, and the development of trading strategies, making them valuable in algorithmic and systematic trading. ## Examples +### A basic OHLC plot + +Visualize the key summary statistics of a stock price as it evolves. Pass the column name of the instrument to `x`, and pass the `open`, `high`, `low`, and `close` arguments the appropriate column names. + +```python order=ohlc_plot,stocks_1min_ohlc,stocks +import deephaven.plot.express as dx +import deephaven.agg as agg +stocks = dx.data.stocks() + +# compute ohlc per symbol for each minute +stocks_1min_ohlc = stocks.update_view( + "BinnedTimestamp = lowerBin(Timestamp, 'PT1m')" +).agg_by( + [ + agg.first("Open=Price"), + agg.max_("High=Price"), + agg.min_("Low=Price"), + agg.last("Close=Price"), + ], + by=["Sym", "BinnedTimestamp"], +) + +# create a basic candlestick plot - the `open`, `high`, `low`, and `close` arguments must be specified +ohlc_plot = dx.ohlc( + stocks_1min_ohlc.where("Sym == `DOG`"), + x="BinnedTimestamp", + open="Open", + high="High", + low="Low", + close="Close", +) +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/other.md b/plugins/plotly-express/docs/other.md deleted file mode 100644 index c7c3cb08d..000000000 --- a/plugins/plotly-express/docs/other.md +++ /dev/null @@ -1,19 +0,0 @@ -# Titles, labels and legends - -## Titles - -Title text can be added to plots using the `title` parameter. The `title` parameter accepts a string, which will be used as the title text. The `title` parameter can be used with any plot type. Titles support a limited subset of html and css styling. For example, you can use `
` to add line breaks, and `` to make text bold. Or you can use css styling to change the font size or color. - -### Examples - -## Labels - -Axis labels can be added to plots using the `labels` parameter. The `labels` parameter accepts a dictionary, which maps axis names to label text. The `labels` parameter can be used with any plot type. - -### Examples - -## Legends - -Legends are automatically added to plots when there are multiple series in the plot. Legends can be customized using the `unsafe_update_figure` parameter. For example, you can change the position of the legend, or hide the legend. The `unsafe_update_figure` parameter accepts a dictionary, which will be passed to the plotly `update_layout` function. The `unsafe_update_figure` parameter can be used with any plot type. - -### Examples diff --git a/plugins/plotly-express/docs/pie.md b/plugins/plotly-express/docs/pie.md index 31c46ed86..bfac966a2 100644 --- a/plugins/plotly-express/docs/pie.md +++ b/plugins/plotly-express/docs/pie.md @@ -2,18 +2,36 @@ A pie plot is a circular data visualization that illustrates the relative proportions of discrete categories within a dataset by dividing a circle into sectors. This format provides a quick and straightforward way to convey the composition of data. -Pie plots are useful for: +Pie plots are appropriate when the data contain a categorical variable where the frequencies of each category can be computed. -1. **Proportional Representation**: Pie plots effectively convey the proportional distribution of categories, making them useful when you want to highlight the relative size of discrete components within a whole. -2. **Simplicity**: Pie plots are straightforward to interpret and can be especially valuable when communicating data to non-technical audiences, as they provide an easily digestible overview of data composition. +### What are pie plots useful for? -Limitations of pie plots include: +- **Proportional representation**: Pie plots effectively convey the proportional distribution of categories, making them useful when you want to highlight the relative size of discrete components within a whole. +- **Simplicity**: Pie plots are straightforward to interpret and can be especially valuable when communicating data to non-technical audiences, as they provide an easily digestible overview of data composition. -1. **Limited Categories**: Pie plots become less effective when dealing with a large number of categories, as it can be challenging to differentiate and interpret small slices, leading to cluttered and less informative visualizations. Consider using a bar plot instead. -2. **Comparison Complexity**: Comparing the sizes of slices in a pie plot is less precise than with other chart types, such as bar plots or stacked bar charts. This makes it less suitable for situations where accurate quantitative comparisons are crucial. +Pie plots do have some limitations. They become less effective when dealing with a large number of categories, as it can be challenging to differentiate and interpret small slices, leading to cluttered and less informative visualizations. Consider using a [bar plot](bar.md) instead. ## Examples +# A basic pie plot + +Visualize the contribution of each part to the whole, arranged clockwise from greatest to least contribution. Pass the label column name to the `names` argument, and the value column name to the `values` argument. + +```python order=pie_plot,gapminder_recent_pop,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# get table of most recent total population per continent +gapminder_recent_pop = ( + gapminder + .last_by("Country") + .drop_columns(["Country", "LifeExp", "GdpPerCap"]) + .sum_by(["Year", "Month", "Continent"]) +) + +pie_plot = dx.pie(gapminder_recent_pop, names="Continent", values="Pop") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.pie diff --git a/plugins/plotly-express/docs/plot-by.md b/plugins/plotly-express/docs/plot-by.md index 54a79bc79..f9d2044ab 100644 --- a/plugins/plotly-express/docs/plot-by.md +++ b/plugins/plotly-express/docs/plot-by.md @@ -1,5 +1,146 @@ # Plot By -To plot multiple series from a table into a single chart, use the `by` parameter. This parameter accepts a column name or a list of column names. The chart will be partitioned by the values in the specified column(s), with one series for each unique value. Other parameters, such as `color` (for which `by` is an alias), `symbol`, `size`, `width`, and `line_dash` can also be used to partition the chart. +To plot multiple series from a table into a single chart, use the `by` parameter. This parameter accepts a column name or a list of column names denoting other variables of interest in the dataset. The chart will be partitioned by the values in the specified column(s), with one series for each unique value. Other parameters, such as `color`, `symbol`, `size`, `width`, and `line_dash` can also be used to partition the chart. + +Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given grouping column to create each series. This efficient implementation means that plots with multiple groups can easily scale to tens of millions or billions of rows with ease. ## Examples + +### Scatter plot by a categorical variable + +Create a [scatter plot](scatter.md), where the color of each point is determined by a categorical grouping variable. + +```python order=pedal_size_by_species,iris +import deephaven.plot.express as dx +iris = dx.data.iris() # import the example iris data set + +# specify `x` and `y` columns, as well as additional grouping variable with `by` +pedal_size_by_species = dx.scatter(iris, x="PetalLength", y="PetalWidth", by="Species") +``` + +Or, use `symbol` to differentiate groups with symbols. + +```python order=pedal_size_by_species_sym,iris +import deephaven.plot.express as dx +iris = dx.data.iris() # import the example iris data set + +# use different symbols to denote different groups +pedal_size_by_species_sym = dx.scatter(iris, x="PetalLength", y="PetalWidth", symbol="Species") +``` + +### Scatter plot by a numeric variable + +Use a numeric variable with the `size` parameter to change the size of the points based on the value of the numeric variable. + +```python order=total_bill_tip_size,tips +import deephaven.plot.express as dx +tips = dx.data.tips() # import a ticking version of the Tips dataset + +# the `size` column from tips gives the number in the party +total_bill_tip_size = dx.scatter(tips, x="TotalBill", y="Tip", size="Size") +``` + +If the sizes are too large or small, use the `size_map` argument to map each numeric value to a more appropriate size. + +```python order=total_bill_tip_size,tips +import deephaven.plot.express as dx +tips = dx.data.tips() # import a ticking version of the Tips dataset + +# the `size` column from tips gives the number in the party, map it to different sizes +total_bill_tip_size = dx.scatter( + tips, x="TotalBill", y="Tip", size="size", + size_map={"1": 5, "2": 7, "3": 11, "4": 13, "5": 15, "6": 17} +) +``` + +### Scatter plot by several categorical variables + +Pass two or more column names to the `by` argument to color points based on unique combinations of values. + +```python order=total_bill_sex_smoker,tips +import deephaven.plot.express as dx +tips = dx.data.tips() # import a ticking version of the Tips dataset + +# passing a list to `by` gives unique colors for each combination of values in the given columns +total_bill_sex_smoker = dx.scatter(tips, x="TotalBill", y="Tip", by=["Sex", "Smoker"]) +``` + +Alternatively, use other arguments such as `symbol` or `size` to differentiate groups. + +```python order=total_bill_sex_smoker_sym,tips +import deephaven.plot.express as dx +tips = dx.data.tips() # import a ticking version of the Tips dataset + +# use color to denote sex, and symbol to denote smoking status +total_bill_sex_smoker_sym = dx.scatter(tips, x="TotalBill", y="Tip", by="Sex", symbol="Smoker") +``` + +### Line plot by a categorical variable + +Use a [line plot](line.md) to track the trends of a numeric variable over time, broken into categories using `by`. + +```python order=prices_by_sym,stocks +import deephaven.plot.express as dx +stocks = dx.data.stocks() # import ticking Stocks dataset + +# use `by` argument to plot prices by stock symbol +prices_by_sym = dx.line(stocks, x="Timestamp", y="Price", by="Sym") +``` + +In the case of a line plot, `line_dash` can also be used to differentiate lines for different categories. + +```python order=prices_by_sym_dash,stocks +import deephaven.plot.express as dx +stocks = dx.data.stocks() # import ticking Stocks dataset + +# use `line_dash` argument to change line appearance per stock symbol +prices_by_sym = dx.line(stocks, x="Timestamp", y="Price", line_dash="Sym") +``` + +### Histogram plot by a categorical variable + +Use `by` with [histograms](histogram.md) to visualize the distributions of multiple groups of data. Histograms can be stacked, or overlaid using `barmode="overlay"`. + +```python order=life_exp_hist,life_exp_hist_overlaid,recent_gapminder,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() # import ticking Gapminder dataset + +# filter by most recent instance of each country +recent_gapminder = gapminder.last_by("Country") + +# create histogram of life expectancy distribution for each continent +life_exp_hist = dx.histogram(recent_gapminder, x="LifeExp", by="Continent") + +# overlay histograms for easier visualization +life_exp_hist_overlaid = dx.histogram(recent_gapminder, x="LifeExp", by="Continent", barmode="overlay") +``` + +### Box plot by a categorical variable + +Use `by` with [box plots](box.md) to visualize the distributions of multiple groups of data. Unlike histograms, using the `by` argument with box plots stacks them vertically. + +```python order=life_exp_box,recent_gapminder,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() # import ticking Gapminder dataset + +# filter by most recent instance of each country +recent_gapminder = gapminder.last_by("Country") + +# box plot gives 5-number summary and potential outliers +life_exp_box = dx.box(recent_gapminder, x="LifeExp", by="Continent") +``` + +### Violin plot by a categorical variable + +Use `by` with [violin plots](violin.md) to visualize the distributions of multiple groups of data. The `by` argument for a violin plot behaves similarly to a box plot. + +```python order=life_exp_violin,recent_gapminder,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() # import ticking Gapminder dataset + +# filter by most recent instance of each country +recent_gapminder = gapminder.last_by("Country") + +# the violins may be too thin to be useful +life_exp_violin = dx.violin(recent_gapminder, x="LifeExp", by="Continent") +``` diff --git a/plugins/plotly-express/docs/scatter-3d.md b/plugins/plotly-express/docs/scatter-3d.md index c41f907a8..b003e0455 100644 --- a/plugins/plotly-express/docs/scatter-3d.md +++ b/plugins/plotly-express/docs/scatter-3d.md @@ -2,136 +2,134 @@ A 3D scatter plot is a type of data visualization that displays data points in three-dimensional space. Each data point is represented as a marker or point, and its position in the plot is determined by the values of three different variables, one for each axis (x, y, and z). This plot allows for the visualization of relationships and patterns among three continuous variables simultaneously. -A 3D scatter plot is useful for: +3D scatter plots are appropriate when a continuous response variable depends on two continuous explanatory variables. If there is an additional categorical variable that the response variable depends on, shapes or colors can be used in the scatter plot to distinguish the categories. -1. **Visualizing Multivariate Data**: When you have three variables of interest, a 3D scatter plot allows you to visualize and explore their relationships in a single plot. It enables you to see how changes in one variable affect the other two, providing a more comprehensive understanding of the data. -2. **Identifying Clusters and Patterns**: In some datasets, 3D scatter plots can reveal clusters or patterns that might not be evident in 2D scatter plots. The added dimensionality can help identify complex structures and relationships that exist in the data. -3. **Outlier Detection**: Outliers, which are data points that deviate significantly from the general pattern, can be more easily spotted in a 3D scatter plot. They may appear as isolated points away from the main cluster, drawing attention to potentially interesting observations or anomalies. +### What are 3D scatter plots useful for? -However, 3D scatter plots also have some limitations. When dealing with more than three variables, visual interpretation can become challenging. Overplotting (many points overlapping) can obscure patterns, and certain perspectives may lead to misinterpretation. In such cases, alternative visualizations like 3D surface plots or parallel coordinate plots might be considered. +- **Visualizing multivariate data**: When you have three variables of interest, a 3D scatter plot allows you to visualize and explore their relationships in a single plot. It enables you to see how changes in one variable affect the other two, providing a more comprehensive understanding of the data. +- **Identifying clusters and patterns**: In some datasets, 3D scatter plots can reveal clusters or patterns that might not be evident in 2D scatter plots. The added dimensionality can help identify complex structures and relationships that exist in the data. +- **Outlier detection**: Outliers, which are data points that deviate significantly from the general pattern, can be more easily spotted in a 3D scatter plot. They may appear as isolated points away from the main cluster, drawing attention to potentially interesting observations or anomalies. ## Examples -### A basic scatter plot +### A basic 3D scatter plot -Visualize the relationship between three variables. Defined as an x, y and z supplied using column names. +Visualize the relationship between three variables by passing their column names to the `x`, `y`, and `z` arguments. Click and drag on the resulting chart to rotate it for new perspectives. -```python order=scatter_plot,mytable +```python order=scatter_plot_3D,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Create a basic scatter plot by specifying the x and y column -scatter_3d_plot = dx.scatter_3d(my_table, x="sepal_width", y="sepal_length", z="petal_width") +scatter_plot_3D = dx.scatter_3d(iris, x="SepalWidth", y="SepalLength", z="PetalWidth") ``` -### 3d bubble charts sized from a column +### Create a bubble plot -A 3d bubble chart is a type of data visualization that displays data points as spheres, where the position of each sphere corresponds to three variables, and the size of the sphere represents a fourth variable. +Use the size of the markers in a 3D scatter plot to visualize a fourth quantitative variable. Such a plot is commonly called a bubble plot, where the size of each bubble corresponds to the value of the additional variable. -The size column values function as the sphere size, you may consider scaling or normalizing these values before plotting the bubble chart. +The `size` argument interprets the values in the given column as pixel size, so you may consider scaling or normalizing these values before creating the bubble chart. -```python order=bubble_3d_plot +```python order=bubble_plot_3D,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Sets size of the circle using values from a column sized in pixels -bubble_3d_plot = dx.scatter_3d(my_table, x="sepal_width", y="sepal_length", z="petal_width", size="petal_length") +bubble_plot_3D = dx.scatter_3d(iris, x="SepalWidth", y="SepalLength", z="PetalWidth", size="PetalLength") ``` -### Color scatter plot by group +### Color markers by group -Plot values by group. The query engine performs a `parition_by` on the given color column to create each series. +Denote groups of data by using the color of the markers as group indicators. Pass the name of the grouping column(s) to the `by` argument. -```python order=scatter_plot,mytable +```python order=scatter_plot_3D_groups,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Assign unique colors to each grouping key in a column -scatter_3d_plot = dx.scatter_3d(my_table, x="sepal_width", y="sepal_length", z="petal_width", color="species") +scatter_plot_3D_groups = dx.scatter_3d(iris, x="SepalWidth", y="SepalLength", z="PetalWidth", by="Species") ``` -### Color using a continuous color scale +Customize these colors using the `color_discrete_sequence` or `color_discrete_map` arguments. Any [CSS color name](https://www.w3schools.com/cssref/css_colors.php), hexadecimal color code, or set of RGB values will work. -Colors can be set to a continuous scale, instead of by group as above. Use any of the built in color scales, or specify a custom scale. - - - -```python order=scatter_plot_color_by,scatter_plot_color_custom +```python order=scatter_3D_custom_1,scatter_3D_custom_2,scatter_3D_custom_3,iris,iris_with_custom_colors import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Use built in color scales -scatter_3d_plot_color_by = dx.scatter_3d( - my_table, - x="sepal_width", - y="sepal_length", - z="petal_width", - color="petal_length", - # use any plotly express built in color scale names - color_continuous_scale="viridis" -) - -# Ex 2. Use a custom color scale -scatter_plot_color_custom = dx.scatter_3d( - my_table, - x="sepal_width", - y="sepal_length", - z="petal_width", - color="petal_length", - # custom scale colors can be any valid browser css color - color_continuous_scale=["lemonchiffon", "#FA8173", "rgb(201, 61, 44)"] -) -``` - -### Color using custom discrete colors - -```python order=scatter_plot_color_sequence,scatter_plot_color_map,scatter_plot_color_column -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Set custom colors -scatter_plot_color_sequence = dx.scatter_3d( - my_table, - x="sepal_width", - y="sepal_length", - z="petal_width", - # group colors by a column - color="species", +iris = dx.data.iris() + +# set custom colors using color_discrete_sequence +scatter_3D_custom_1 = dx.scatter_3d( + iris, + x="SepalWidth", + y="SepalLength", + z="PetalWidth", + by="Species", # A list of colors to sequentially apply to one or more series # The colors loop if there are more series than colors color_discrete_sequence=["salmon", "#fffacd", "rgb(100,149,237)"] ) -# Ex 2. Set trace colors from a map of colors -scatter_plot_color_map = dx.scatter_3d( - my_table, - x="sepal_width", - y="sepal_length", - z="petal_width", - # group colors by a column - color="species", +# use a dictionary to specify custom colors +scatter_3D_custom_2 = dx.scatter_3d( + iris, + x="SepalWidth", + y="SepalLength", + z="PetalWidth", + by="Species", # set each series to a specific color color_discrete_map={"virginica":"lemonchiffon", "setosa": "cornflowerblue", "versicolor":"#FA8173"} ) -# Ex 3. Set colors using values from a column -# Generate a column of valid CSS colors to use as an example -table_with_column_of_colors = my_table.update( - "example_colors = `rgb(` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) +`)`" +# or, create a new table with a column of colors, and use that column for the color values +iris_with_custom_colors = iris.update( + "ExampleColors = `rgb(` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) +`)`" ) -scatter_plot_color_column = dx.scatter_3d( - table_with_column_of_colors, - x="sepal_width", - y="sepal_length", - z="petal_width", - color="example_colors", +scatter_3D_custom_3 = dx.scatter_3d( + iris_with_custom_colors, + x="SepalWidth", + y="SepalLength", + z="PetalWidth", + by="ExampleColors", # When set to `identity`, the column data passed to the # color parameter will used as the actual color color_discrete_map="identity" ) ``` +### Color markers by a continuous variable + +Markers can also be colored by a continuous value by specifying the `color_continuous_scale` argument. + +```python order=scatter_3D_color,iris +import deephaven.plot.express as dx +iris = dx.data.iris() + +# use the `color` argument to specify the value column, and the `color_continuous_scale` to specify the color scale +scatter_3D_color = dx.scatter_3d( + iris, + x="SepalWidth", + y="SepalLength", + z="PetalWidth", + by="PetalLength", + # use any plotly express built in color scale name + color_continuous_scale="viridis" +) +``` + +Or, supply your own custom color scale to `color_continuous_scale`. + +```python order=scatter_3D_custom_color,iris +import deephaven.plot.express as dx +iris = dx.data.iris() + +scatter_3D_custom_color = dx.scatter_3d( + iris, + x="SepalWidth", + y="SepalLength", + z="PetalWidth", + by="PetalLength", + # custom scale colors can be any valid browser css color + color_continuous_scale=["lemonchiffon", "#FA8173", "rgb(201, 61, 44)"] +) +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.scatter_3d diff --git a/plugins/plotly-express/docs/scatter-polar.md b/plugins/plotly-express/docs/scatter-polar.md index de07d79a1..286c26654 100644 --- a/plugins/plotly-express/docs/scatter-polar.md +++ b/plugins/plotly-express/docs/scatter-polar.md @@ -2,14 +2,27 @@ Polar scatter plots are a data visualization method that represents data points on a polar coordinate system. They display individual data points as dots or markers in a circular plot, providing a means to visualize the distribution, relationships, or patterns in data with angular or directional dependencies. Polar scatter plots are particularly useful for exploring data within a circular context, where the angle or periodic nature of the data is a significant aspect. -Polar scatter plots are useful for: +Polar scatter plots are appropriate when the data contain a continuous variable represented in polar coordinates, with a radial and an angular component instead of the typical x and y components. -1. **Cyclical Data Analysis**: Polar scatter plots are valuable for analyzing data with cyclical or periodic patterns, as they enable the visualization of cyclic trends and periodic variations within the data. -2. **Directional Data Representation**: They are used to represent directional data, such as wind directions, compass bearings, or angular measurements, providing a visual means to explore data with specific orientations. -3. **Angular or Periodic Data Relationships**: Polar scatter plots aid in exploring relationships and correlations in data with angular or periodic dependencies, making them suitable for applications where understanding circular patterns is essential. +### What are polar scatter plots useful for? + +- **Analyzing cyclical data**: Polar scatter plots are valuable for analyzing data with cyclical or periodic patterns, as they enable the visualization of cyclic trends and periodic variations within the data. +- **Representing directional data**: They are used to represent directional data, such as wind directions, compass bearings, or angular measurements, providing a visual means to explore data with specific orientations. +- **Angular or Periodic Data Relationships**: Polar scatter plots aid in exploring relationships and correlations in data with angular or periodic dependencies, making them suitable for applications where understanding circular patterns is essential. ## Examples +### A basic polar scatter plot + +Visualize a dataset in polar coordinates by passing column names to the `r` and `theta` arguments. `theta` may be a string of cardinal directions, as in this case. `theta` also supports the use of numeric types that may represent radians or degrees, depending on how the `range_theta` argument is supplied. + +```python order=polar_scatter_plot,wind +import deephaven.plot.express as dx +wind = dx.data.wind() + +# `by` is used to separate data by groups +polar_scatter_plot = dx.scatter_polar(wind, r="Frequency", theta="Direction", by="Strength") +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/scatter-ternary.md b/plugins/plotly-express/docs/scatter-ternary.md index a84625fd1..413b20e0e 100644 --- a/plugins/plotly-express/docs/scatter-ternary.md +++ b/plugins/plotly-express/docs/scatter-ternary.md @@ -2,14 +2,27 @@ Ternary scatter plots are a data visualization method used to represent data within a triangular coordinate system. They display individual data points as markers or dots in the triangular space, offering a means to visualize the distribution, relationships, or patterns in data that consist of three mutually exclusive components. Ternary scatter plots are particularly useful when analyzing compositional data or data involving proportions that sum to a constant total. -Ternary scatter plots are useful for: +Ternary scatter plots are appropriate when the data contain three interrelated mutually exclusive categories whose relationships can be quantified with a continuous variable. -1. **Compositional Data Analysis**: Ternary scatter plots are useful for analyzing data where proportions of three components add up to a constant total. They help visualize the distribution of these components and their relationships within the composition. -2. **Multivariate Data Exploration**: They can be applied in multivariate data analysis to visualize relationships, patterns, and trends among three variables or components, particularly when these components are interrelated. -3. **Optimization Studies**: Ternary scatter plots aid in optimization studies to understand how adjustments in the proportions of three components impact the overall composition, making them valuable in informed decision-making processes. +### What are ternary scatter plots useful for? + +- **Compositional data analysis**: Ternary scatter plots are useful for analyzing data where proportions of three components add up to a constant total. They help visualize the distribution of these components and their relationships within the composition. +- **Exploring multivariate data**: They can be applied in multivariate data analysis to visualize relationships, patterns, and trends among three variables or components, particularly when these components are interrelated. +- **Optimization studies**: Ternary scatter plots aid in optimization studies to understand how adjustments in the proportions of three components impact the overall composition, making them valuable in informed decision-making processes. ## Examples +### A basic ternary scatter plot + +Visualize a ternary dataset by passing column names to each of the `a`, `b`, and `c` arguments. + +```python order=ternary_scatter_plot,election +import deephaven.plot.express as dx +election = dx.data.election() + +# create a ternary scatter plot by specifying the columns for the three points of the triangle +ternary_scatter_plot = dx.scatter_ternary(election, a="Joly", b="Coderre", c="Bergeron") +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/scatter.md b/plugins/plotly-express/docs/scatter.md index bba75cff1..29e946ae9 100644 --- a/plugins/plotly-express/docs/scatter.md +++ b/plugins/plotly-express/docs/scatter.md @@ -2,312 +2,262 @@ A scatter plot is a type of data visualization that uses Cartesian coordinates to display values for typically two variables. It represents individual data points as dots on a graph, with each dot's position indicating its corresponding values on the two variables being plotted. -Scatter plots are useful for: +Scatter plots are appropriate when the data contain a continuous response variable that directly depends on a continuous explanatory variable. If there is an additional categorical variable that the response variable depends on, shapes or colors can be used in the scatter plot to distinguish the categories. For large datasets (> 1 million points), consider using a [density heatmap](density_heatmap.md) instead of a scatter plot. -1. **Relationship Exploration**: Scatter plots are useful for exploring and visualizing the relationship between two continuous variables. By plotting the data points, you can quickly identify patterns, trends, or correlations between the variables. It helps in understanding how changes in one variable affect the other. -2. **Outlier Detection**: Scatter plots are effective in identifying outliers or extreme values in a dataset. Outliers appear as points that deviate significantly from the general pattern of the data. By visualizing the data in a scatter plot, you can easily spot these outliers, which may be important in certain analyses. -3. **Clustering Analysis**: If you suspect that your data might exhibit clusters or groups, a scatter plot can help you identify those clusters. By observing the distribution of the points, you can visually determine if there are distinct groups forming or if the points are evenly spread out. -4. **Comparison of Data Sets**: Scatter plots are useful for comparing two different data sets or groups. By plotting them on the same scatter plot, you can visually assess similarities, differences, or relationships between the two sets of data. This can be particularly helpful in scientific research, social sciences, or business analytics. -5. **Visualizing Multivariate Data**: While scatter plots typically display two variables, you can extend them to visualize multivariate data by using different techniques. For example, you can represent the third variable through the size or color of the data points, creating a 3D scatter plot or using bubble charts. -6. **Create Event Markers**: You can layer a scatter plot on top of another plot type to act as markers on the line and represent events. For example, on a line plot you can layer a scatter plot on top to act as markers representing a stock split or dividend being payed at a certain time. +### What are scatter plots useful for? -Remember that the choice of plot depends on the nature of your data, the specific questions you want to answer, and the insights you want to gain. While scatter plots are versatile and provide valuable information about relationships between variables, other types of plots such as bar charts, line graphs, or histograms may be more appropriate for different scenarios. +- **Exploring relationships**: Scatter plots are useful for exploring and visualizing the relationship between two continuous variables. By plotting the data points, you can quickly identify patterns, trends, or correlations between the variables. It helps in understanding how changes in one variable affect the other. +- **Outlier detection**: Scatter plots are effective in identifying outliers or extreme values in a dataset. Outliers appear as points that deviate significantly from the general pattern of the data. By visualizing the data in a scatter plot, you can easily spot these outliers, which may be important in certain analyses. +- **Clustering analysis**: If you suspect that your data might exhibit clusters or groups, a scatter plot can help you identify those clusters. By observing the distribution of the points, you can visually determine if there are distinct groups forming or if the points are evenly spread out. ## Examples ### A basic scatter plot -Visualize the relationship between two variables. Defined as an x and y pair supplied using column names. +Visualize the relationship between two variables by passing each column name to the `x` and `y` arguments. -```python order=scatter_plot,mytable +```python order=scatter_plot,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Create a basic scatter plot by specifying the x and y column -scatter_plot = dx.scatter(my_table, x="sepal_width", y="sepal_length") +scatter_plot = dx.scatter(iris, x="SepalWidth", y="SepalLength") ``` -### Bubble charts sized from a column +### Create a bubble plot -A bubble chart is a type of data visualization that displays data points as circles, where the position of each circle corresponds to two variables, and the size of the circle represents a third variable. +Use the `size` argument to resize the markers by a third quantitative variable. Such a plot is commonly called a bubble plot, where the size of each bubble corresponds to the value of the additional variable. -The size column values function as the pixel size, you may consider scaling or normalizing these values before plotting the bubble chart. +The `size` argument interprets the values in the given column as pixel size, so you may consider scaling or normalizing these values before creating the bubble chart. -```python order=bubble_plot,mytable +```python order=bubble_plot,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Sets size of the circle using values from a column sized in pixels -bubble_plot = dx.scatter(my_table, x="sepal_width", y="sepal_length", size="petal_length") +bubble_plot = dx.scatter(iris, x="SepalWidth", y="SepalLength", size="PetalLength") ``` -### Color scatter plot by group +### Color markers by group -Plot values by group. The query engine performs a `parition_by` on the given color column to create each series. +Denote groups of data by using the color of the markers as group indicators by passing the grouping column name to the `by` argument. -```python order=scatter_plot,mytable +```python order=scatter_plot_groups,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Assign unique colors to each grouping key in a column -scatter_plot_by_group = dx.scatter(my_table, x="sepal_width", y="sepal_length", color="species") +scatter_plot_groups = dx.scatter(iris, x="SepalWidth", y="SepalLength", by="Species") ``` -### Color using a continuous color scale +Customize these colors using the `color_discrete_sequence` or `color_discrete_map` arguments. Any [CSS color name](https://www.w3schools.com/cssref/css_colors.php), hexadecimal color code, or set of RGB values will work. -Colors can be set to a continuous scale, instead of by group as above. Use any of the built in color scales, or specify a custom scale. - - - -```python order=scatter_plot_color_by,scatter_plot_color_custom -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Use built in color scales -scatter_plot_color_by = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - color="petal_length", - # use any plotly express built in color scale names - color_continuous_scale="viridis" -) - -# Ex 2. Use a custom color scale -scatter_plot_color_custom = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - color="petal_length", - # custom scale colors can be any valid browser css color - color_continuous_scale=["lemonchiffon", "#FA8173", "rgb(201, 61, 44)"] -) -``` - -### Color using custom discrete colors - -```python order=scatter_plot_color_sequence,scatter_plot_color_map,scatter_plot_color_column +```python order=custom_colors_1,custom_colors_2,custom_colors_3,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Set custom colors -scatter_plot_color_sequence = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # group colors by a column - color="species", +iris = dx.data.iris() + +# use a list +custom_colors_1 = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + by="Species", # A list of colors to sequentially apply to one or more series # The colors loop if there are more series than colors color_discrete_sequence=["salmon", "#fffacd", "rgb(100,149,237)"] ) -# Ex 2. Set trace colors from a map of colors -scatter_plot_color_map = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # group colors by a column - color="species", +# or a dictionary +custom_colors_2 = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + by="Species", # set each series to a specific color color_discrete_map={"virginica":"lemonchiffon", "setosa": "cornflowerblue", "versicolor":"#FA8173"} ) -# Ex 3. Set colors using values from a column -# Generate a column of valid CSS colors to use as an example -table_with_column_of_colors = my_table.update( +# or, create a new table with a column of colors, and use that column for the color values +iris_with_custom_colors = iris.update( "example_colors = `rgb(` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) + `,` + Math.round(Math.random() * 255) +`)`" ) -scatter_plot_color_column = dx.scatter( - table_with_column_of_colors, - x="sepal_width", - y="sepal_length", - color="example_colors", +custom_colors_3 = dx.scatter( + iris_with_custom_colors, + x="SepalWidth", + y="SepalLength", + by="example_colors", # When set to `identity`, the column data passed to the - # color parameter will used as the actual color - color_discrete_map="identity", + # grouping/color parameter will be used as the actual color + color_discrete_map="identity" ) ``` -### Symbols by group +### Color markers by a continuous variable -Symbols can be statically assigned, assigned to a group as part of a `partition_by` operation drawing from a sequence, or from a map. See the symbol list for all available symbols. +Markers can also be colored by a continuous value by specifying the `color_continuous_scale` argument. - - -```python order=scatter_plot_diamonds,scatter_plot_symbol_by,scatter_plot_symbol_map +```python order=scatter_plot_conts,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Assign a custom symbol -scatter_plot_symbol = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # See list of available symbols. - symbol_sequence=["diamond"] +iris = dx.data.iris() + +scatter_plot_conts = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + by="PetalLength", + # use any plotly express built in color scale name + color_continuous_scale="viridis" ) +``` -# Ex 2. Use symbols to differentiate groups -scatter_plot_symbol_by = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - color="species", - # Assign symbols by group, shown using default symbol_sequence - symbol="species" -) +Or, supply your own custom color scale to `color_continuous_scale`. -# Ex 3. Use a map to assign symbols to groups -scatter_plot_symbol_map = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - color="species", - # Using a map for symbols by value - symbol="species", - symbol_map={"setosa":"cross", "versicolor":"pentagon", "virginica":"star"} +```python order=custom_colors_conts,iris +import deephaven.plot.express as dx +iris = dx.data.iris() + +custom_colors_conts = dx.scatter_3d( + iris, + x="SepalWidth", + y="SepalLength", + by="PetalLength", + # custom scale colors can be any valid browser css color + color_continuous_scale=["lemonchiffon", "#FA8173", "rgb(201, 61, 44)"] ) ``` -### Error Bars +### Unique symbols by group -Error bars can be set on x and/or y, using values from a column. +Rather than using the color of the markers to visualize groups, you can use different symbols for each group with the `symbol`, `symbol_map`, or `symbol_sequence` arguments. -```python order=scatter_plot_error,scatter_plot_error_minus +```python order=scatter_plot_symbol_1,scatter_plot_symbol_2,scatter_plot_symbol_3,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Use values from a column as positive and negative error bars -scatter_plot_error = dx.scatter( - my_table.update("error_sepal_width = sepal_width * 0.01"), - x="sepal_width", - y="sepal_length", - error_x="error_sepal_width", +iris = dx.data.iris() + +# assign the grouping column to the `symbol` argument, and plotly will pick a symbol for each group +scatter_plot_symbol_1 = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + by="Species", + # Assign symbols by group, shown using default symbol_sequence + symbol="Species" ) -#Ex 2. Use values from two columns for y-positive-error and y-negative-error -scatter_plot_error_minus = dx.scatter( - my_table.update( - [ - # let's pretend these columns represent error - "error_sepal_length_positive = petal_width * 0.25", - "error_sepal_length_negative = petal_length * 0.25", - ] - ), - x="sepal_width", - y="sepal_length", - # will be use as positive and negative error unless _minus is set - error_y="error_sepal_length_positive", - error_y_minus="error_sepal_length_negative", +# or, assign a sequence of symbols to the `symbol_sequence` argument +scatter_plot_symbol_2 = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + # See list of available symbols. + symbol_sequence=["diamond", "circle", "triangle"] +) + +# use `symbol_map` to assign a particular symbol to each group +scatter_plot_symbol_3 = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + by="Species", + # Using a map for symbols by value + symbol="Species", + symbol_map={"setosa":"cross", "versicolor":"pentagon", "virginica":"star"} ) ``` -### Labels and Hover Text +### Rename axes - +Use the `labels` argument or the `xaxis_titles` and `yaxis_titles` arguments to change the names of the axis labels. -```python order=scatter_plot_title,scatter_plot_axes_titles +```python order=scatter_plot_labels_1,scatter_plot_labels_2,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Ex 1. Label axes using a map -scatter_plot_title = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # Adds a title label, title supports a subset of html and css - title="Iris Scatter Plot", - # re-label the axis - labels={"sepal_width": "Sepal Width", "sepal_length": "Sepal Length"}, - # adds values from a column as bolded text to hover tooltip - hover_name="species" +iris = dx.data.iris() + +# pass a dict of axis names to the `labels` argument to rename the axes +scatter_plot_labels_1 = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + # relabel axes with a dict + labels={"SepalWidth": "Sepal Width", "SepalLength": "Sepal Length"}, ) -# Ex 2. Label multiple axes using an array of strings -scatter_plot_axes_titles = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - xaxis_titles=["Sepal Width"], - yaxis_titles=["Sepal Length"], +# or, pass a new label to each of `xaxis_titles` and `yaxis_titles` +scatter_plot_labels_2 = dx.scatter( + iris, + x="SepalWidth", + y="SepalLength", + # relabel axes with separate strings + xaxis_titles="Sepal Width", + yaxis_titles="Sepal Length", ) ``` ### Marginals -Plot marginals are additional visual representations, like histograms or density plots, displayed alongside the main plot to provide insights into the individual distributions of variables being analyzed. They enhance the understanding of data patterns and trends by showing the univariate distribution of each variable in conjunction with the main plot's visualization. +Plot marginals are additional visual representations, like [histograms](histogram.md) or [violin plots](violin.md), displayed alongside the main plot to provide insights into the individual distributions of variables being analyzed. Use the `marginal_x` and `marginal_y` arguments to plot marginals. -```python order=scatter_marginal_histogram,scatter_marginal_violin,scatter_marginal_rug,scatter_marginal_box +```python order=scatter_marginal_histogram,scatter_marginal_violin,scatter_marginal_box,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Ex 1. Histogram style marginals +# histogram style marginals scatter_marginal_histogram = dx.scatter( - my_table, - x="petal_width", - y="petal_length", + iris, + x="PetalWidth", + y="PetalLength", marginal_x="histogram", marginal_y="histogram", ) -# Ex 2. Violin style marginals +# violin style marginals scatter_marginal_violin = dx.scatter( - my_table, - x="petal_width", - y="petal_length", + iris, + x="PetalWidth", + y="PetalLength", marginal_x="violin", marginal_y="violin", ) -# Ex 3. Rug style marginals -scatter_marginal_rug = dx.scatter( - my_table, - x="petal_width", - y="petal_length", - marginal_x="rug", - marginal_y="rug", -) - -# Ex 4. Box style marginals +# box style marginals scatter_marginal_box = dx.scatter( - my_table, - x="petal_width", - y="petal_length", + iris, + x="PetalWidth", + y="PetalLength", marginal_x="box", marginal_y="box", ) ``` -### Log Axes +### Log axes + +Use `log_x` or `log_y` to use log-scale axes in your plot. -```python order=scatter_plot_log +```python order=scatter_plot_log_axes,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -scatter_plot_axes_titles = dx.scatter( - my_table, - x="petal_width", - # Each y value becomes a seperate series - y="petal_length", +# create log axes +scatter_plot_log_axes = dx.scatter( + iris, + x="PetalWidth", + y="PetalLength", log_x=True, log_y=True, ) ``` -### Axes Range +### Rescale axes + +Use `range_x` or `range_y` to set the range values of each axis explicitly. -```python order=scatter_plot_range +```python order=scatter_plot_range_axes,iris import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -scatter_plot_range = dx.scatter( - my_table, - x="petal_width", - # Each y value becomes a seperate series - y="petal_length", - # Set at custom range for each axes +iris = dx.data.iris() + +# set the axis range explicitly +scatter_plot_range_axes = dx.scatter( + iris, + x="PetalWidth", + y="PetalLength", range_x=[0,5], range_y=[0,10], ) @@ -319,14 +269,14 @@ You can create multiple axes on a single graph in a number of different ways dep ```python order=scatter_plot_title,scatter_plot_axes_titles import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() -# Ex 1. Create multiple axes from mulitple columns +# create multiple axes from mulitple columns scatter_plot_axes_titles = dx.scatter( - my_table, - x="sepal_width", - # Each y value becomes a seperate series - y=["sepal_length", "petal_length"], + iris, + x="SepalWidth", + # each y value becomes a seperate series + y=["SepalLength", "PetalLength"], # position each axis for each series yaxis_sequence=[1, 2], # Label the axes @@ -335,63 +285,62 @@ scatter_plot_axes_titles = dx.scatter( ) -# Ex 2. Create multiple axes by values from a column +# create multiple axes by values from a column stocks_table = dx.data.stocks().where("sym in `DOG`, `CAT`") scatter_stocks = dx.scatter( stocks_table, - x="timestamp", - y="price", - # Parition color by sym - color="sym", + x="Timestamp", + y="Price", + by="Sym", # Apply each trace to a different axis yaxis_sequence=[1, 2], # Label each axis, where order is by first appearence in the data yaxis_titles=["CAT", "DOG"], ) -#Ex 3. Create multiple axes from multiple tables using layers +# create multiple axes from multiple tables using layers layered_table = dx.data.iris() # import the example iris data set # split into two tables by species -table_setosa = layered_table.where("species = `setosa`") -table_versicolor = layered_table.where("species = `versicolor`") +table_setosa = layered_table.where("Species = `setosa`") +table_versicolor = layered_table.where("Species = `versicolor`") # layer two plots together, layout is inherited from the last table in the layer layered_scatter = dx.layer( # scatter plot from table 1 dx.scatter( table_setosa, - x="petal_width", - y="petal_length", + x="PetalWidth", + y="PetalLength", color_discrete_sequence=["salmon"], ), # scatter from table 2 dx.scatter( table_versicolor, - x="petal_width", - y="petal_length", + x="PetalWidth", + y="PetalLength", color_discrete_sequence=["lemonchiffon"], # place this trace on a secondary axis yaxis_sequence=[2], # set the titles for both axes, as layer inherits from this layout - yaxis_titles=["versicolor petal_length","setosa petal_length"] + yaxis_titles=["Versicolor Petal Length","Setosa Petal Length"] ) ) ``` -### Layer as Event Markers +### Layer event markers Combines a line plot and a scatter plot to use as event markers indicating the maximum peak in each series. ```python order=scatter_as_markers,marker_table import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set +iris = dx.data.iris() # import the example iris data set # find the max peaks of each series to use as our example markers -marker_table = my_table.select(["species", "petal_length", "timestamp"]).join( - my_table.select(["species", "petal_length"]).max_by("species"), - on=["species", "petal_length"], +marker_table = iris.select(["Species", "PetalLength", "Timestamp"]).join( + iris.select(["Species", "PetalLength"]).max_by("Species"), + on=["Species", "PetalLength"], ) # layer as scatter on a line plot @@ -399,48 +348,42 @@ scatter_as_markers = dx.layer( # create a scatter plot to use as markers dx.scatter( marker_table, - x="timestamp", - y="petal_length", + x="Timestamp", + y="PetalLength", symbol_sequence=["x"], size_sequence=[15], - hover_name="species", + hover_name="Species", color_discrete_sequence=["#FFF"], ), # layer it with a line plot dx.line( - my_table, - x="timestamp", - y="petal_length", - color="species", + iris, + x="Timestamp", + y="PetalLength", + by="Species", ), ) ``` -### Large Data Sets +### Large data sets -The default `render_mode` is webgl and can comfortably plot around 0.5 - 1 million points before performance of the browser will begin to degrade. In `render_mode=svg` that drops to around 10,000 points, but may offer more accurate rendering for some GPUs. +Deephaven's scatter plots can comfortably render around 0.5 - 1 million points before performance of the browser will begin to degrade. For large datasets under 1 million observations, setting an appropriate marker opacity and/or marker size can provide a much clearer picture of the data. If the number of points is expected to exceed 1 million, consider employing a [density heatmap](density_heatmap.md) as an alternative visualization method, which can easily summarize billions of data points in a single plot. -In situations where scatter plots become impractical due to overlaping markers in large datasets, it is advisable to consider employing a Density Heatmap (2D Histogram) as an alternative visualization method. This approach allows for data binning through the query engine, enabling visualization of billions of data points, making it more suitable for handling such scenarios. Moreover, users may benefit from a clearer interpretation of the data using this method. + ```python order=heatmap_replacement,scatter_plot_opacity +from deephaven.plot import express as dx +from deephaven import empty_table -For large, but managable datasets, setting an appropriate opacity can be beneficial as it helps address data overlap issuess, making the individual data points more distinguishable and enhancing overall visualization clarity. +large_data = empty_table(1_000_000).update([ + "X = 50 + 25 * cos(i * Math.PI / 180)", + "Y = 50 + 25 * sin(i * Math.PI / 180)", +]) -[Density Heatmap](density_heatmap.md) +# heatmap can be a good alternative to scatter plots with many points +heatmap_replacement = dx.density_heatmap(large_data, x="X", y="Y", range_bins_x=[0,100], range_bins_y=[0,100]) -```python order=density_heatmap,scatter_plot_opacity -import deephaven.plot.express as dx -my_table = dx.data.iris() # import the example iris data set - -# Consider a density heatmap for large data sets -heatmap_replacement = dx.density_heatmap(my_table, x="sepal_width", y="sepal_length") - -scatter_plot_opacity = dx.scatter( - my_table, - x="sepal_width", - y="sepal_length", - # For data sets with a high degree of overlap between points, consider setting opacity - opacity=0.5 -) -``` +# alternatively, consider a scatter plot with reduced opacity +scatter_plot_opacity = dx.scatter(large_data, x="X", y="Y", range_x=[0,100], range_y=[0,100], opacity=0.01) + ``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/sidebar.json b/plugins/plotly-express/docs/sidebar.json index 26395c9cf..3278c9613 100644 --- a/plugins/plotly-express/docs/sidebar.json +++ b/plugins/plotly-express/docs/sidebar.json @@ -34,8 +34,8 @@ "path": "candlestick.md" }, { - "label": "ECDF", - "path": "ecdf.md" + "label": "Density Heatmap", + "path": "density_heatmap.md" }, { "label": "Funnel", @@ -133,10 +133,6 @@ { "label": "Multiple axes", "path": "multiple-axes.md" - }, - { - "label": "Titles and legends", - "path": "other.md" } ] } diff --git a/plugins/plotly-express/docs/strip.md b/plugins/plotly-express/docs/strip.md index 992c5291c..ef8c37bd7 100644 --- a/plugins/plotly-express/docs/strip.md +++ b/plugins/plotly-express/docs/strip.md @@ -1,18 +1,46 @@ # Strip Plot -In a strip plot, individual data points are displayed along a single axis, providing a clear view of the distribution of data points without the additional density estimation and summary statistics provided by a violin plot. While strip plots offer a detailed look at individual data points, they may not convey the overall distribution shape and density as effectively as a violin plot. The choice between strip plots and violin plots depends on the specific analytical needs and the level of detail required for data representation. +In a strip plot, individual data points are displayed along a single axis, providing a clear view of the distribution of data points without the additional density estimation and summary statistics provided by a violin plot. By default, the plotted categories are ordered by their appearance in the dataset. -Strip plots are useful for: +Strip plots are appropriate when the data contain a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, stacked strip plots may be appropriate. The data should be relatively sparse, as strip plots can get crowded quickly with large datasets. This may make it difficult to spot multimodal distributions, heavy-tailed distributions, or outliers. In such cases, [box plots](box.md) or [violin plots](violin.md) may be more appropriate. -1. **Individual Data Points**: Displaying individual data points along an axis, allowing for a detailed view of each data point in the dataset. -2. **Identifying Outliers**: Facilitating the easy identification of outliers and anomalies within the data, aiding in data quality assessment. -3. **Small to Moderate Dataset Visualization**: Suitable for datasets of small to moderate sizes where individual data points can be effectively represented. -4. **Comparing Data Categories**: Comparing the distribution and spread of data across different categories or groups, making it useful for categorical data analysis. +### What are strip plots useful for? + +- **Comparing data categories**: Strip plots effectively present the distribution of a dataset, and make it easy to compare the distributions of different categories of data. +- **Identifying outliers**: Because strip plots are made up of individual points, they are well-suited for identifying potential outliers in datasets. +- **Small to moderate dataset visualization**: Strip plots are suitable for visualizing the distribution of small to moderate-sized datasets, where individual data points can be effectively represented. ## Examples +### A basic strip plot + +Visualize the distribution of a continuous variable by passing its column name to the `x` or `y` arguments. + +```python order=strip_plot,thursday_tips,tips +import deephaven.plot.express as dx +tips = dx.data.tips() + +# subset to get a single group +thursday_tips = tips.where("Day == `Thur`") + +strip_plot = dx.strip(thursday_tips, x="TotalBill", color_discrete_sequence=["lightgreen"]) +``` + +### Distributions for multiple groups + +Strip plots are useful for comparing the distributions of two or more groups of data. Pass the name of the grouping column(s) to the `by` argument. + +```python order=strip_plot_group,tips +import deephaven.plot.express as dx +tips = dx.data.tips() + +strip_plot_group = dx.strip(tips, x="TotalBill", by="Day", color_discrete_sequence=["lightgreen", "lightblue", "goldenrod", "lightcoral"]) +``` + +> [!NOTE] +> At the moment, `color_discrete_sequence` must be specified explicitly to get the points to render. ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.strip -``` \ No newline at end of file +``` diff --git a/plugins/plotly-express/docs/sub-plots.md b/plugins/plotly-express/docs/sub-plots.md index 9723c5b81..3c9619b03 100644 --- a/plugins/plotly-express/docs/sub-plots.md +++ b/plugins/plotly-express/docs/sub-plots.md @@ -1,11 +1,40 @@ # Sub plots -Multiple sub plots can be combined into one plot using the `make_subplots` function. This function accepts multiple plot objects, and returns a single plot object. The plot objects can be any of the plot types supported by Deephaven Plotly Express. They can be arranged in a grid, or in a single row or column. The `shared_xaxes` and `shared_yaxes` parameters can be used to share axes between plots. +Multiple sub plots can be combined into one plot using the `make_subplots` function. This function accepts multiple plot objects, and returns a single plot object. The plot objects can be any of the plot types supported by Deephaven Express. They can be arranged in a grid, or in a single row or column. The `shared_xaxes` and `shared_yaxes` parameters can be used to share axes between plots. ## Examples -## API Reference +### Four unique plots + +Create a series of plots as subplots, all providing unique perspectives on the data of interest. +```python order=tipping_plots,tips +import deephaven.plot.express as dx +tips = dx.data.tips() # import a ticking version of the Tips dataset + +# create 4 plots from within make_subplots +tipping_plots = dx.make_subplots( + dx.scatter(tips, x="TotalBill", y="Tip", by="Sex", + title="Tip amount by total bill"), + dx.violin(tips, y="TotalBill", by="Day", + title="Total bill distribution by day"), + dx.pie( + tips + .count_by("Count", by=["Sex", "Smoker"]) + .update_view("SmokerStatus = Smoker == `No` ? `non-smoker` : `smoker`") + .update_view("SmokerLabel = Sex + ` ` + SmokerStatus"), + names="SmokerLabel", values="Count", + title="Total bill by sex and smoking status"), + dx.bar(tips + .view(["TotalBill", "Tip", "Day"]) + .avg_by("Day"), + x="Day", y=["TotalBill", "Tip"], + title="Average tip as a fraction of total bill"), + rows=2, cols=2, shared_xaxes=False, shared_yaxes=False +) +``` + +## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.make_subplots -``` +``` \ No newline at end of file diff --git a/plugins/plotly-express/docs/sunburst.md b/plugins/plotly-express/docs/sunburst.md index 01758ed37..341ec1bcf 100644 --- a/plugins/plotly-express/docs/sunburst.md +++ b/plugins/plotly-express/docs/sunburst.md @@ -2,14 +2,36 @@ Sunburst plots are a data visualization technique used to represent hierarchical data with a radial layout. They display data as nested rings or sectors, where each level of the hierarchy is represented by a ring, and each category or subcategory is shown as a sector within the ring. Sunburst plots provide an effective way to visualize hierarchical data structures and the relationships between different levels and categories within the data, making them a valuable tool for understanding complex data hierarchies. -Sunburst plots are useful for: +Sunburst plots are appropriate when the data have a hierarchical structure. Each level of the hierarchy consists of a categorical variable and an associated numeric variable with a value for each unique category. -1. **Hierarchical Data Visualization**: Sunburst plots are valuable for visualizing hierarchical data structures, making them suitable for applications where data has multiple levels of nested categories or relationships. Developers can use sunburst plots to represent data in a manner that clearly illustrates the hierarchical organization of information. -2. **Tree Maps Replacement**: Sunburst plots can be an alternative to tree maps for visualizing hierarchical data. Developers can use sunburst plots to present hierarchical data in a space-efficient and visually appealing manner. This can be particularly beneficial in applications where screen real estate is limited, and users need to view hierarchical data with an interactive and intuitive interface. -3. **Drill-Down Data Exploration**: Developers can implement sunburst plots for drill-down data exploration, allowing users to interactively explore and delve deeper into hierarchical data by clicking on sectors to reveal lower-level categories or information. This use case is valuable in applications that require detailed hierarchical data analysis. +### What are sunburst plots useful for? + +- **Hierarchical Data Visualization**: Sunburst plots are valuable for visualizing hierarchical data structures, making them suitable for applications where data has multiple levels of nested categories or relationships. Developers can use sunburst plots to represent data in a manner that clearly illustrates the hierarchical organization of information. +- **Tree Maps Replacement**: Sunburst plots can be an alternative to tree maps for visualizing hierarchical data. Developers can use sunburst plots to present hierarchical data in a space-efficient and visually appealing manner. This can be particularly beneficial in applications where screen real estate is limited, and users need to view hierarchical data with an interactive and intuitive interface. +- **Drill-Down Data Exploration**: Developers can implement sunburst plots for drill-down data exploration, allowing users to interactively explore and delve deeper into hierarchical data by clicking on sectors to reveal lower-level categories or information. This use case is valuable in applications that require detailed hierarchical data analysis. ## Examples +### A basic sunburst plot + +Visualize a hierarchical dataset as concentric circles, with the size of each group decreasing in a counter-clockwise fashion. Use the `names` argument to specify the column name for each group's labels, the `values` argument to specify the column name for each group's values, and the `parents` column to specify the root category of the chart. + +```python order=sunburst_plot,gapminder_recent,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# create table of only the most recent year of data, compute total population for each continent +gapminder_recent = ( + gapminder + .last_by("Country") + .view(["Continent", "Pop"]) + .sum_by("Continent") + .update("World = `World`") +) + +sunburst_plot = dx.sunburst(gapminder_recent, names="Continent", values="Pop", parents="World") +``` + ## API Reference ```{eval-rst} .. dhautofunction:: deephaven.plot.express.sunburst diff --git a/plugins/plotly-express/docs/timeline.md b/plugins/plotly-express/docs/timeline.md index 701a31f7f..3c134f9db 100644 --- a/plugins/plotly-express/docs/timeline.md +++ b/plugins/plotly-express/docs/timeline.md @@ -1,9 +1,21 @@ # Timeline Plot -Timeline plots in offer a means to visualize time-related data, displaying events, durations, or activities along a time axis. Developers can utilize these plots for applications that require users to understand temporal patterns and relationships, such as project management, event scheduling, and historical data analysis. +Timeline plots offer a way to visualize time-related data, displaying events, durations, or activities along a time axis. Developers can utilize these plots for applications that require users to understand temporal patterns and relationships, such as project management, event scheduling, and historical data analysis. + +A timeline plot is appropriate when the data contain a categorical variable whose categories become relevant in different places across a timeline. An example may be the years that various members in a band have been active - some may have been active for the duration of the band's career, others may have only appeared in the early days and then left, some may have passed away and been replaced, and so on. Timeline plots are often used to display this data, such as [this timeline plot](https://en.wikipedia.org/wiki/Metallica#Timeline) detailing the member composition of the band Metallica throughout the years. ## Examples +### A basic timeline plot + +Visualize the amount of time that each category in a column met a specific criteria. Pass the start and end timestamp column names to the `x_start` and `x_end` arguments, and category column name to the `y` argument. + +```python order=timeline_plot,jobs +import deephaven.plot.express as dx +jobs = dx.data.jobs() + +timeline_plot = dx.timeline(jobs, x_start="StartTime", x_end="EndTime", y="Job") +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/treemap.md b/plugins/plotly-express/docs/treemap.md index 0078dcd12..d84dbed97 100644 --- a/plugins/plotly-express/docs/treemap.md +++ b/plugins/plotly-express/docs/treemap.md @@ -2,14 +2,35 @@ Treemap plots are a data visualization technique used to represent hierarchical data in a space-filling manner. They display data as nested rectangles or squares, where the size and color of each rectangle represent the values or categories within the hierarchy. Developers can create treemaps to provide users with an efficient and visually intuitive way to explore hierarchical data structures and understand relationships between categories and subcategories, making them a valuable tool for various applications that involve hierarchical data presentation, analysis, and exploration. -Treemap plots are useful for: +Treemap plots are appropriate when the data have a hierarchical structure. Each level of the hierarchy consists of a categorical variable and an associated numeric variable with a value for each unique category. -1. **Hierarchical Data Visualization**: Treemap plots are valuable for visualizing hierarchical data structures with multiple levels of nested categories or relationships. Developers can use treemaps to represent data in a space-efficient manner, making them suitable for applications where data has complex hierarchical organizations. -2. **Hierarchical Data Comparison**: Treemaps can be used to compare data within hierarchical structures, allowing users to understand the distribution of categories and their relative sizes. Developers can implement features that enable users to compare data across multiple hierarchies or time periods. -3. **Data Summarization**: Treemaps are effective for summarizing large amounts of hierarchical data into a compact, visual format. Developers can use treemaps to provide users with an overview of hierarchical data, and users can drill down into specific categories for more detailed information. +### What are treemap plots useful for? + +- **Visualizing hierarchical data**: Treemap plots are valuable for visualizing hierarchical data structures with multiple levels of nested categories or relationships. Developers can use treemaps to represent data in a space-efficient manner, making them suitable for applications where data has complex hierarchical organizations. +- **Hierarchical data comparison**: Treemaps can be used to compare data within hierarchical structures, allowing users to understand the distribution of categories and their relative sizes. Developers can implement features that enable users to compare data across multiple hierarchies or time periods. +- **Data summarization**: Treemaps are effective for summarizing large amounts of hierarchical data into a compact, visual format. Developers can use treemaps to provide users with an overview of hierarchical data, and users can drill down into specific categories for more detailed information. ## Examples +### A basic treemap plot + +Visualize a hierarchical dataset as nested rectangles, with the size of each rectangle corresponding to a value for a particular group. Use the `names` argument to specify the column name for each group's labels, the `values` argument to specify the column name for each group's values, and the `parents` column to specify the root category of the chart. + +```python order=treemap_plot,gapminder_recent,gapminder +import deephaven.plot.express as dx +gapminder = dx.data.gapminder() + +# create table of only the most recent year of data, compute total population for each continent +gapminder_recent = ( + gapminder + .last_by("Country") + .view(["Continent", "Pop"]) + .sum_by("Continent") + .update("World = `World`") +) + +treemap_plot = dx.treemap(gapminder_recent, names="Continent", values="Pop", parents="World") +``` ## API Reference ```{eval-rst} diff --git a/plugins/plotly-express/docs/violin.md b/plugins/plotly-express/docs/violin.md index 2335e3a0d..8797d812b 100644 --- a/plugins/plotly-express/docs/violin.md +++ b/plugins/plotly-express/docs/violin.md @@ -2,15 +2,42 @@ A violin plot is a data visualization that combines a box plot with a rotated kernel density plot to provide a comprehensive representation of the data distribution. It offers a detailed view of the data's central tendency, spread and density. -Violin plots are useful for: +Violin plots are appropriate when the data contain a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side violin plots may be appropriate using the `by` argument. -1. **Data Distribution Comparison**: Violin plots are effective for visually comparing and contrasting the distribution of multiple datasets or categories, allowing for quick identification of differences in data patterns. -2. **Density Estimation**: They offer a detailed view of data density, making it easier to understand the distribution's shape, modes, and any potential multi-modality. -3. **Central Tendency and Spread**: Violin plots provide insights into the central tendencies and variability of data, including the median, quartiles, and potential outliers. -4. **Multimodal Data**: They are particularly useful when dealing with data that exhibits multiple modes or peaks, as they can reveal these underlying patterns effectively. +### What are violin plots useful for? + +- **Comparing distributions**: Violin plots are effective for visually comparing and contrasting the distribution of multiple datasets or categories, allowing for quick identification of differences in data patterns. +- **Assessing central tendency and spread**: Violin plots provide insights into the central tendencies and variability of data, including the median, quartiles, and potential outliers. +- **Identifying multimodal data**: They are particularly useful when dealing with data that exhibits multiple modes or peaks, as they can reveal these underlying patterns effectively. ## Examples +### A basic violin plot + +Visualize the distribution of a single variable by passing the column name to the `x` or `y` arguments. + +```python order=violin_plot_x,violin_plot_y,versicolor +import deephaven.plot.express as dx +iris = dx.data.iris() + +# subset to get a specific group +versicolor = iris.where("Species == `versicolor`") + +# control the plot orientation using `x` or `y` +violin_plot_x = dx.violin(versicolor, x="SepalLength") +violin_plot_y = dx.violin(versicolor, y="SepalLength") +``` + +### Distributions for multiple groups + +Create separate violins for each group of data by passing the name of the grouping column(s) to the `by` argument. + +```python order=violin_plot_group,iris +import deephaven.plot.express as dx +iris = dx.data.iris() + +violin_plot_group = dx.violin(iris, x="SepalLength", by="Species") +``` ## API Reference ```{eval-rst}