Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BetaGeoBetaBinom Distribution Block #431

Merged
merged 35 commits into from
May 27, 2024

Conversation

ColtAllen
Copy link
Collaborator

@ColtAllen ColtAllen commented Nov 11, 2023

This PR resurrects #188 to partially address #176.

The Beta-Geometic/Beta Binomial model is a CLV model for discrete, non-contractual use cases - a good example would be sporting events. This PR adds a distribution block to be used for posterior predictive checks and the logp in the full model which will be added in a separate PR.

For now this is a draft PR because I can't get the logp to return the correct values; for reference please see (5) on p4 of the research paper:

https://www.brucehardie.com/papers/020/fader_et_al_mksc_10.pdf

I could use some help with the pytensor functions this requires.


📚 Documentation preview 📚: https://pymc-marketing--431.org.readthedocs.build/en/431/

@ColtAllen ColtAllen added enhancement New feature or request help wanted Extra attention is needed CLV labels Nov 11, 2023
@ColtAllen ColtAllen self-assigned this Nov 11, 2023
pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
@ricardoV94
Copy link
Contributor

I'll try to check this week @ColtAllen

@ricardoV94
Copy link
Contributor

I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.08%. Comparing base (9aa8bc0) to head (60a1f55).

Current head 60a1f55 differs from pull request most recent head c376695

Please upload reports for the commit c376695 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
- Coverage   92.25%   92.08%   -0.17%     
==========================================
  Files          24       24              
  Lines        2414     2490      +76     
==========================================
+ Hits         2227     2293      +66     
- Misses        187      197      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ColtAllen
Copy link
Collaborator Author

I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a pytensor issue:

ImportError: cannot import name 'vectorize' from 'pytensor.graph' 
(/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)

@ColtAllen
Copy link
Collaborator Author

I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)

In general I don't see very large values of T being a common use case with this model. Is Scan the most performant approach? If so I'm fine leaving it as-is.

@ricardoV94
Copy link
Contributor

I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a pytensor issue:

ImportError: cannot import name 'vectorize' from 'pytensor.graph' 
(/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)

It's now called vectorize_graph

@ricardoV94
Copy link
Contributor

Is Scan the most performant approach? If so I'm fine leaving it as-is.

Scan is the simplest approach

@ColtAllen
Copy link
Collaborator Author

Tests are still failing with vectorize_graph. Do the library dependencies require updating?

@ricardoV94
Copy link
Contributor

Yes you need to bump the oldest supported pymc version to the most recent

@ricardoV94
Copy link
Contributor

In here and in the ci wofkflow file: https://github.com/pymc-labs/pymc-marketing/blob/main/pyproject.toml#L20

@ColtAllen
Copy link
Collaborator Author

ColtAllen commented Feb 24, 2024

I had to increase rtol because we're comparing against the data simulation function in lifetimes, and it lacks a random seed. If the sim_data function looks good and I'm indexing properly in this test, this should be good to go.

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColtAllen this looks good! I wanna merge this one and iterate (I can help with that). I have just two requests :D :

  • Remove commented code
  • Fix pre-commit (I think Ruff is complaining)

Otherwise LGTM.

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
@ColtAllen
Copy link
Collaborator Author

@ColtAllen this looks good! I wanna merge this one and iterate (I can help with that). I have just two requests :D :

  • Remove commented code
  • Fix pre-commit (I think Ruff is complaining)

Otherwise LGTM.

First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.

@wd60622
Copy link
Contributor

wd60622 commented May 27, 2024

First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.

Doesn't seem to be possible to turn off and on at the moment based on this ruff issue. Maybe you can define the string outside of the class then use f-string. Could reduce 8 something characters on a line.
Could turn off for the whole file if there is a docstring specific option

@juanitorduz
Copy link
Collaborator

I think we can simply do it as in https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/clv/distributions.py#L407

@juanitorduz juanitorduz merged commit ae84163 into pymc-labs:main May 27, 2024
9 checks passed
@juanitorduz
Copy link
Collaborator

Thanks @ColtAllen ! Let's now work on the model 🎉 !

@ColtAllen ColtAllen deleted the bgbb_dist branch May 28, 2024 14:29
twiecki pushed a commit that referenced this pull request Sep 10, 2024
* init commit

* removed scan

* Fix logp

* Remove print statement

* Add test for logp notimplemented errors

* docstrings

* dev notebook added

* updated to vectorize_graph

* import order

* update oldest pymc_version

* Update ci.yml pymc version

* Update pyproject.toml pymc version

* WIP sample prior testing

* sample prior compared against lifetimes

* increase rtol

* remove commented code, add logp reference

* fix latex docstring

* notebook testing and misc edits

* revert latex in docstring

* add ruff ignore comment

---------

Co-authored-by: Ricardo Vieira <[email protected]>
twiecki pushed a commit that referenced this pull request Sep 10, 2024
* init commit

* removed scan

* Fix logp

* Remove print statement

* Add test for logp notimplemented errors

* docstrings

* dev notebook added

* updated to vectorize_graph

* import order

* update oldest pymc_version

* Update ci.yml pymc version

* Update pyproject.toml pymc version

* WIP sample prior testing

* sample prior compared against lifetimes

* increase rtol

* remove commented code, add logp reference

* fix latex docstring

* notebook testing and misc edits

* revert latex in docstring

* add ruff ignore comment

---------

Co-authored-by: Ricardo Vieira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants