-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): Return datetime for mean/median of Date column #15679
Conversation
920cb7e
to
f9c3401
Compare
f9c3401
to
798fda9
Compare
CodSpeed Performance ReportMerging #15679 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15679 +/- ##
===========================================
+ Coverage 39.14% 81.34% +42.19%
===========================================
Files 1364 1374 +10
Lines 167705 176486 +8781
Branches 3031 2538 -493
===========================================
+ Hits 65646 143559 +77913
+ Misses 101592 32448 -69144
- Partials 467 479 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
||
match self.dtype() { | ||
Boolean => self.cast(&Float64).unwrap().agg_mean(groups), | ||
Float32 => SeriesWrap(self.f32().unwrap().clone()).agg_mean(groups), | ||
Float64 => SeriesWrap(self.f64().unwrap().clone()).agg_mean(groups), | ||
dt if dt.is_numeric() => apply_method_physical_integer!(self, agg_mean, groups), | ||
#[cfg(feature = "dtype-date")] | ||
Date => temporal_mean( | ||
&(self.cast(&Int64).unwrap() * (MS_IN_DAY as f64)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cast this after the aggregation. That should be much cheaper.
dt if dt.is_numeric() => apply_method_physical_integer!(self, agg_median, groups), | ||
#[cfg(feature = "dtype-date")] | ||
Date => temporal_median( | ||
&(self.cast(&Int64).unwrap() * (MS_IN_DAY as f64)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem
@@ -54,6 +54,7 @@ pub(crate) enum AggregateFunction { | |||
SumU64(SumAgg<u64>), | |||
SumI32(SumAgg<i32>), | |||
SumI64(SumAgg<i64>), | |||
MeanDate(MeanAgg<i64>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this for now. We are redesigning the streaming engine and don't want to add stuff atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it all felt ugly to me anyway.
Thank you @mcrumiller. I have left a few comments. I think we should remove the streaming engine for now. That's something I want to redesign. |
@ritchie46 removing the streaming code ends up breaking Should I try to find a workaround, or should I remove the date mean/median from the streaming tests? |
After thinking of it a bit. We should remove the physical implementations (we could panic there) and do all the casts on the logical side. During type-coercion we can set the proper casts. This will ensure it works on all physical engines. And we panic on the physical side to ensure we don't hit those paths |
Thanks @ritchie46. I may need a little help here though. Are you referring to logical and physical dtypes (I think you are), or logical plan versus physical plan? Also, this sort of sounds like it might be better off as a separate PR, do you agree? One of the issues I see is that different logical types have different physical implementations than their pure physical counterpart (at least, now |
@ritchie46 any proposal for how to proceed with this one? I'm sort of waiting on the streaming engine rewrite. Should we merge and just have date mean broken for streaming until that comes out? Or alternatively, keep my prior "hack" to get it working on the current engine? Or maybe I missed something from your comment that means I can resolve this immediately. I've been busy and let this slip. |
Resolves #13598. Part of overall #13599.
This treats dates and datetimes as analagous to integers/floats: the mean of ints is a float, and now the mean of dates is a datetime:
This is the last temporal mean/median PR, which means a lot of custom logic for the different temporal types has been collapsed/simplified. If this is accepted, we can deprecate
dt.mean
anddt.median
, as mean/median for all temporals have now been implemented, they are now functionally equivalent toseries.mean
andseries.median
for all temporal dtypes. I'll do a followup onstd
/var
/quantile
.This PR is almost identical to #13492, but that was pretty old/stale and cluttered, so easier to just make a new one. @ritchie46 to get streaming to work, I had to update the
MeanAgg
strutc with a flag to indicat the Date -> Datetime conversion. AFAIK Date is the only dtype that requires a value conversion for mean now, so I don't think we need a more general implementation, but you may think otherwise.