Support by(max_n) and by(min_n) #1229

ianthomas23 · 2023-06-06T11:17:31Z

Support for categorical max_n and min_n reductions such as ds.by("cat", ds.max_n("value", n=3)) on CPU and GPU both with and without dask. This is the first part of issue #1210, support for categorical first_n, last_n and where to follow.

Example:

import datashader as ds
import numpy as np
from numpy import nan
import pandas as pd

x = np.arange(2)
df = pd.DataFrame(dict(
    y_from = [0.0, 1.0, 0.0, 1.0, 0.0],
    y_to   = [0.0, 1.0, 1.0, 0.0, 0.5],
    value  = [1.1, 3.3, 5.5, 2.2, 4.4],
    cat    = ['a', 'b', 'a', 'b', 'a'],
))
df["cat"] = df["cat"].astype("category")

canvas = ds.Canvas(plot_height=2, plot_width=3)
agg = canvas.line(source=df, x=x, y=["y_from", "y_to"], axis=1,
                  agg=ds.by("cat", ds.max_n("value", n=3)))
print(agg)

which prints

xarray.DataArray (y: 2, x: 3, cat: 2, n: 3)>
array([[[[5.5, 4.4, 1.1],
         [nan, nan, nan]],

        [[1.1, nan, nan],
         [2.2, nan, nan]],

        [[1.1, nan, nan],
         [2.2, nan, nan]]],


       [[[nan, nan, nan],
         [3.3, 2.2, nan]],

        [[5.5, 4.4, nan],
         [3.3, nan, nan]],

        [[5.5, 4.4, nan],
         [3.3, nan, nan]]]])
Coordinates:
  * x        (x) float64 0.1667 0.5 0.8333
  * y        (y) float64 0.25 0.75
  * cat      (cat) <U1 'a' 'b'
  * n        (n) int64 0 1 2
Attributes:
    x_range:  (0, 1)
    y_range:  (0.0, 1.0)

Note that the returned DataArray has shape (ny, nx, ncat, n) which I think is more logical than the alternative possibility of (ny, nx, n, ncat).

In terms of implementation, functions like nanmax_n_in_place now always accept a 4D array so that there is a single implementation for 3D (max) and 4D (max_n) arrays for each of CPU and GPU. Use of the combine function in max inserts the extra dimension of size 1 to change the shape without copying any data.

ianthomas23 · 2023-06-06T11:18:52Z

datashader/tests/test_pandas.py

 df_pd.at[2,'f32'] = nan
 df_pd.at[2,'f64'] = nan
 df_pd.at[2,'plusminus'] = nan
+# x          0  0   0  0 0   0 0  0 0  0   1   1  1   1  1    1  1   1  1   1


I find myself creating this manual table whenever I need to check new tests, so here including it to make it easier to check new tests in future.

ianthomas23 · 2023-06-06T11:19:52Z

datashader/tests/test_pandas.py

@@ -652,7 +708,7 @@ def test_categorical_sum_binning(df):


 @pytest.mark.parametrize('df', dfs)
-def test_categorical_max(df):
+def test_categorical_max2(df):


Keeping the existing categorical max test but renaming it so it doesn't overwrite the new one which is above it in this file.

codecov · 2023-06-06T11:53:48Z

Codecov Report

Merging #1229 (bfdcc66) into main (28c8581) will decrease coverage by 0.04%.
The diff coverage is 67.79%.

@@            Coverage Diff             @@
##             main    #1229      +/-   ##
==========================================
- Coverage   83.62%   83.59%   -0.04%     
==========================================
  Files          35       35              
  Lines        8738     8751      +13     
==========================================
+ Hits         7307     7315       +8     
- Misses       1431     1436       +5

Impacted Files	Coverage Δ
datashader/transfer_functions/_cuda_utils.py	`20.63% <0.00%> (ø)`
datashader/reductions.py	`79.02% <47.05%> (-0.22%)`	⬇️
datashader/compiler.py	`88.42% <100.00%> (+0.06%)`	⬆️
datashader/utils.py	`81.63% <100.00%> (+0.09%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

jbednar

Thanks!

jbednar · 2023-06-06T21:38:07Z

We'll need some docs at the Datashader level when you're done with all this, of course.

Support by(max_n) and by(min_n)

bfdcc66

ianthomas23 commented Jun 6, 2023

View reviewed changes

ianthomas23 added this to the v0.15.1 milestone Jun 6, 2023

jbednar approved these changes Jun 6, 2023

View reviewed changes

ianthomas23 merged commit f917cd9 into holoviz:main Jun 7, 2023

ianthomas23 deleted the cat_max_n branch June 7, 2023 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support by(max_n) and by(min_n) #1229

Support by(max_n) and by(min_n) #1229

ianthomas23 commented Jun 6, 2023

ianthomas23 Jun 6, 2023

ianthomas23 Jun 6, 2023

codecov bot commented Jun 6, 2023 •

edited

Loading

jbednar left a comment

jbednar commented Jun 6, 2023

Support by(max_n) and by(min_n) #1229

Support by(max_n) and by(min_n) #1229

Conversation

ianthomas23 commented Jun 6, 2023

ianthomas23 Jun 6, 2023

Choose a reason for hiding this comment

ianthomas23 Jun 6, 2023

Choose a reason for hiding this comment

codecov bot commented Jun 6, 2023 • edited Loading

Codecov Report

jbednar left a comment

Choose a reason for hiding this comment

jbednar commented Jun 6, 2023

codecov bot commented Jun 6, 2023 •

edited

Loading