-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support by(max_n) and by(min_n) #1229
Conversation
df_pd.at[2,'f32'] = nan | ||
df_pd.at[2,'f64'] = nan | ||
df_pd.at[2,'plusminus'] = nan | ||
# x 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find myself creating this manual table whenever I need to check new tests, so here including it to make it easier to check new tests in future.
@@ -652,7 +708,7 @@ def test_categorical_sum_binning(df): | |||
|
|||
|
|||
@pytest.mark.parametrize('df', dfs) | |||
def test_categorical_max(df): | |||
def test_categorical_max2(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the existing categorical max test but renaming it so it doesn't overwrite the new one which is above it in this file.
Codecov Report
@@ Coverage Diff @@
## main #1229 +/- ##
==========================================
- Coverage 83.62% 83.59% -0.04%
==========================================
Files 35 35
Lines 8738 8751 +13
==========================================
+ Hits 7307 7315 +8
- Misses 1431 1436 +5
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
We'll need some docs at the Datashader level when you're done with all this, of course. |
Support for categorical
max_n
andmin_n
reductions such asds.by("cat", ds.max_n("value", n=3))
on CPU and GPU both with and without dask. This is the first part of issue #1210, support for categoricalfirst_n
,last_n
andwhere
to follow.Example:
which prints
Note that the returned DataArray has shape
(ny, nx, ncat, n)
which I think is more logical than the alternative possibility of(ny, nx, n, ncat)
.In terms of implementation, functions like
nanmax_n_in_place
now always accept a 4D array so that there is a single implementation for 3D (max
) and 4D (max_n
) arrays for each of CPU and GPU. Use of the combine function inmax
inserts the extra dimension of size 1 to change the shape without copying any data.