Add `BetaGeoBetaBinom` Distribution Block #431

ColtAllen · 2023-11-11T08:14:59Z

This PR resurrects #188 to partially address #176.

The Beta-Geometic/Beta Binomial model is a CLV model for discrete, non-contractual use cases - a good example would be sporting events. This PR adds a distribution block to be used for posterior predictive checks and the logp in the full model which will be added in a separate PR.

For now this is a draft PR because I can't get the logp to return the correct values; for reference please see (5) on p4 of the research paper:

https://www.brucehardie.com/papers/020/fader_et_al_mksc_10.pdf

I could use some help with the pytensor functions this requires.

📚 Documentation preview 📚: https://pymc-marketing--431.org.readthedocs.build/en/431/

pymc_marketing/clv/distributions.py

ricardoV94 · 2023-11-11T12:05:13Z

I'll try to check this week @ColtAllen

pymc_marketing/clv/distributions.py

ricardoV94 · 2023-11-14T18:24:24Z

I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)

codecov · 2023-11-14T18:42:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.08%. Comparing base (9aa8bc0) to head (60a1f55).

❗ Current head 60a1f55 differs from pull request most recent head c376695

Please upload reports for the commit c376695 to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
- Coverage   92.25%   92.08%   -0.17%     
==========================================
  Files          24       24              
  Lines        2414     2490      +76     
==========================================
+ Hits         2227     2293      +66     
- Misses        187      197      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pymc_marketing/clv/distributions.py

review-notebook-app · 2023-12-11T18:41:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ColtAllen · 2023-12-11T18:45:46Z

I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a pytensor issue:

ImportError: cannot import name 'vectorize' from 'pytensor.graph' 
(/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)

ColtAllen · 2023-12-11T19:00:18Z

I pushed a version of the logp that works with Scan. I think we can make it work without a Scan, by evaluating on the maximum T-tx range. I just don't know if this results in much wasted computation in general use cases (for instance if there's one datapoint with a very large range, but most other datapoints have smaller ranges)

In general I don't see very large values of T being a common use case with this model. Is Scan the most performant approach? If so I'm fine leaving it as-is.

ricardoV94 · 2023-12-16T16:16:09Z

I added the likelihood expression to the docstrings, along with a dev notebook. Not sure if the LaTeX rendered properly in the docstrings though, because the docs build is failing due to a pytensor issue:
ImportError: cannot import name 'vectorize' from 'pytensor.graph' 
(/home/docs/checkouts/readthedocs.org/user_builds/pymc-marketing/envs/431/lib/python3.10/site-packages/pytensor/graph/__init__.py)

It's now called vectorize_graph

ricardoV94 · 2023-12-16T16:16:41Z

Is Scan the most performant approach? If so I'm fine leaving it as-is.

Scan is the simplest approach

ColtAllen · 2023-12-16T17:19:18Z

Tests are still failing with vectorize_graph. Do the library dependencies require updating?

ricardoV94 · 2023-12-16T17:22:58Z

Yes you need to bump the oldest supported pymc version to the most recent

ricardoV94 · 2023-12-16T17:24:22Z

In here and in the ci wofkflow file: https://github.com/pymc-labs/pymc-marketing/blob/main/pyproject.toml#L20

ColtAllen · 2024-02-24T18:37:59Z

I had to increase rtol because we're comparing against the data simulation function in lifetimes, and it lacks a random seed. If the sim_data function looks good and I'm indexing properly in this test, this should be good to go.

juanitorduz

@ColtAllen this looks good! I wanna merge this one and iterate (I can help with that). I have just two requests :D :

Remove commented code
Fix pre-commit (I think Ruff is complaining)

Otherwise LGTM.

pymc_marketing/clv/distributions.py

ColtAllen · 2024-05-26T23:43:23Z

@ColtAllen this looks good! I wanna merge this one and iterate (I can help with that). I have just two requests :D :

Remove commented code

Fix pre-commit (I think Ruff is complaining)

Otherwise LGTM.

First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.

wd60622 · 2024-05-27T00:35:49Z

First point is done, but Ruff is complaining about a LaTeX expression in the docstring that is too long. I'm not sure if there's any way to shorten it that doesn't prevent the docstring from rendering properly.

Doesn't seem to be possible to turn off and on at the moment based on this ruff issue. Maybe you can define the string outside of the class then use f-string. Could reduce 8 something characters on a line.
Could turn off for the whole file if there is a docstring specific option

juanitorduz · 2024-05-27T08:55:51Z

I think we can simply do it as in https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/clv/distributions.py#L407

pymc_marketing/clv/distributions.py

juanitorduz · 2024-05-27T18:29:00Z

Thanks @ColtAllen ! Let's now work on the model 🎉 !

* init commit * removed scan * Fix logp * Remove print statement * Add test for logp notimplemented errors * docstrings * dev notebook added * updated to vectorize_graph * import order * update oldest pymc_version * Update ci.yml pymc version * Update pyproject.toml pymc version * WIP sample prior testing * sample prior compared against lifetimes * increase rtol * remove commented code, add logp reference * fix latex docstring * notebook testing and misc edits * revert latex in docstring * add ruff ignore comment --------- Co-authored-by: Ricardo Vieira <[email protected]>

init commit

81c6e54

ColtAllen added enhancement New feature or request help wanted Extra attention is needed CLV labels Nov 11, 2023

ColtAllen requested a review from ricardoV94 November 11, 2023 08:14

ColtAllen self-assigned this Nov 11, 2023

ricardoV94 reviewed Nov 11, 2023

View reviewed changes

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

removed scan

882018e

ricardoV94 reviewed Nov 14, 2023

View reviewed changes

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

Fix logp

a96929b

Remove print statement

30c39a0

Add test for logp notimplemented errors

777f715

ricardoV94 reviewed Nov 14, 2023

View reviewed changes

pymc_marketing/clv/distributions.py Show resolved Hide resolved

ColtAllen and others added 3 commits December 11, 2023 10:47

Merge branch 'pymc-labs:main' into bgbb_dist

9b59449

docstrings

0aac157

dev notebook added

2bac71d

updated to vectorize_graph

7472fee

ColtAllen added 2 commits January 7, 2024 09:00

Merge branch 'pymc-labs:main' into bgbb_dist

6a0239d

Merge branch 'pymc-labs:main' into bgbb_dist

e70caa6

ColtAllen and others added 2 commits February 24, 2024 11:25

Merge branch 'pymc-labs:main' into bgbb_dist

4040ffc

increase rtol

d4d3455

ColtAllen added 8 commits March 25, 2024 08:13

Merge branch 'main' into bgbb_dist

bb6b893

Merge branch 'pymc-labs:main' into bgbb_dist

22d1269

Merge branch 'main' into bgbb_dist

618f0b3

Merge branch 'pymc-labs:main' into bgbb_dist

f0d9588

Merge branch 'pymc-labs:main' into bgbb_dist

d839bee

Merge branch 'pymc-labs:main' into bgbb_dist

50158a9

Merge branch 'pymc-labs:main' into bgbb_dist

5366087

Merge branch 'pymc-labs:main' into bgbb_dist

455b7a9

juanitorduz requested changes May 23, 2024

View reviewed changes

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

juanitorduz reviewed May 23, 2024

View reviewed changes

pymc_marketing/clv/distributions.py Show resolved Hide resolved

ColtAllen added 4 commits May 26, 2024 16:18

remove commented code, add logp reference

c63e94e

fix latex docstring

60a1f55

notebook testing and misc edits

783d6f5

revert latex in docstring

512aea6

juanitorduz reviewed May 27, 2024

View reviewed changes

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

add ruff ignore comment

c376695

juanitorduz approved these changes May 27, 2024

View reviewed changes

juanitorduz merged commit ae84163 into pymc-labs:main May 27, 2024
9 checks passed

ColtAllen deleted the bgbb_dist branch May 28, 2024 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `BetaGeoBetaBinom` Distribution Block #431

Add `BetaGeoBetaBinom` Distribution Block #431

ColtAllen commented Nov 11, 2023 •

edited by github-actions bot

Loading

ricardoV94 commented Nov 11, 2023

ricardoV94 commented Nov 14, 2023

codecov bot commented Nov 14, 2023 •

edited

Loading

review-notebook-app bot commented Dec 11, 2023

ColtAllen commented Dec 11, 2023

ColtAllen commented Dec 11, 2023

ricardoV94 commented Dec 16, 2023

ricardoV94 commented Dec 16, 2023

ColtAllen commented Dec 16, 2023

ricardoV94 commented Dec 16, 2023

ricardoV94 commented Dec 16, 2023

ColtAllen commented Feb 24, 2024 •

edited

Loading

juanitorduz left a comment

ColtAllen commented May 26, 2024

wd60622 commented May 27, 2024 •

edited

Loading

juanitorduz commented May 27, 2024

juanitorduz commented May 27, 2024

Add BetaGeoBetaBinom Distribution Block #431

Add BetaGeoBetaBinom Distribution Block #431

Conversation

ColtAllen commented Nov 11, 2023 • edited by github-actions bot Loading

ricardoV94 commented Nov 11, 2023

ricardoV94 commented Nov 14, 2023

codecov bot commented Nov 14, 2023 • edited Loading

Codecov Report

review-notebook-app bot commented Dec 11, 2023

ColtAllen commented Dec 11, 2023

ColtAllen commented Dec 11, 2023

ricardoV94 commented Dec 16, 2023

ricardoV94 commented Dec 16, 2023

ColtAllen commented Dec 16, 2023

ricardoV94 commented Dec 16, 2023

ricardoV94 commented Dec 16, 2023

ColtAllen commented Feb 24, 2024 • edited Loading

juanitorduz left a comment

Choose a reason for hiding this comment

ColtAllen commented May 26, 2024

wd60622 commented May 27, 2024 • edited Loading

juanitorduz commented May 27, 2024

juanitorduz commented May 27, 2024

Add `BetaGeoBetaBinom` Distribution Block #431

Add `BetaGeoBetaBinom` Distribution Block #431

ColtAllen commented Nov 11, 2023 •

edited by github-actions bot

Loading

codecov bot commented Nov 14, 2023 •

edited

Loading

ColtAllen commented Feb 24, 2024 •

edited

Loading

wd60622 commented May 27, 2024 •

edited

Loading