-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method thin_fit_result
to CLV models
#393
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #393 +/- ##
==========================================
- Coverage 90.83% 90.70% -0.13%
==========================================
Files 21 21
Lines 1920 1904 -16
==========================================
- Hits 1744 1727 -17
- Misses 176 177 +1 ☔ View full report in Codecov by Sentry. |
0457137
to
387cc5e
Compare
thin_fit_result
to CLV models
The |
That could be useful. I wonder if we should do it in fit though? Or in the heavy functions like clv? Easier in fit but potentially more annoying as well. |
Where's it most likely to crash? I only suggested |
We always know the size of the samples from the idata |
It shouldn't be in fit, because there are not so many parameters in the model. Any problems will appear in functions that compute summary statistics per observations (most predictive methods, including the |
Sounds good. Maybe add a comment about using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the documentation suggestions, the only other remarks I have concern the reductions in testing coverage.
Are fit_result
and expected_customer_lifetime_value
fully covered when the --runslow
tests are ran?
Also, are there plans to expand the placeholder methods output_var
, _generate_and_preprocess_model_data
, and _data_setter
? I'm not sure what the need is for those methods at the moment.
If you're chasing 100% coverage in beta_geo.py
, you can adapt the following code:
https://github.com/pymc-labs/pymc-marketing/blob/main/tests/clv/models/test_pareto_nbd.py#L302
I think the coverage is just because the tests are not passing yet. Regarding the weird methods, they are abstract methods that the base ModelBuilder class demands but are not at all relevant for CLV |
387cc5e
to
16ee6e8
Compare
@ColtAllen can I get your input again? I opened an issue for documenting the new feature in #448 |
16ee6e8
to
80c48ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did CodeCov not update for the new customer_lifetime_value
test? I also made a note in #448 that it is not resolved by this PR, nor does it have to be.
As for the placeholder methods, I have some ideas on how to use them, but it would be CLV model-specific and probably outside the scope of this PR.
I think this is good to merge. Thanks for cleaning up the CLV base class!
Awesome thanks guys! |
CLV summary statistic models can be quite memory intensive and fail with the standard 4k draws. Because these methods are all built assuming there is a
self.idata
lying around (called in multiple places), the two hacky solutions would be to pass athin
argument to every method or to modify idata inplace.This PR instead adds a method that returns a copy of the CLV model with a thinned dataset. A user can then use the methods in this thinned model to obtain summary stats.
Closes #374
This PR also cleans up several internals in the CLVModel base-class.
TODO
customer_lifetime_value
📚 Documentation preview 📚: https://pymc-marketing--393.org.readthedocs.build/en/393/