Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more content to the Gamma-Gamma Notebook #573

Merged
merged 5 commits into from
Apr 5, 2024

Conversation

juanitorduz
Copy link
Collaborator

@juanitorduz juanitorduz commented Mar 7, 2024

Similar to #571.


📚 Documentation preview 📚: https://pymc-marketing--573.org.readthedocs.build/en/573/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@juanitorduz juanitorduz self-assigned this Mar 7, 2024
@juanitorduz juanitorduz added docs Improvements or additions to documentation CLV labels Mar 7, 2024
Copy link

codecov bot commented Mar 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.79%. Comparing base (8590b45) to head (4d261db).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #573   +/-   ##
=======================================
  Coverage   91.79%   91.79%           
=======================================
  Files          22       22           
  Lines        2267     2267           
=======================================
  Hits         2081     2081           
  Misses        186      186           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@juanitorduz juanitorduz requested a review from twiecki March 8, 2024 20:45
Copy link

review-notebook-app bot commented Mar 14, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-03-14T19:09:43Z
----------------------------------------------------------------

Why not link directly to the CLV Quickstart csv? It's the same dataset.


@ricardoV94
Copy link
Contributor

@juanitorduz looks great. Could you just install from conda to get rid of those ugly C-API warnings?

@juanitorduz
Copy link
Collaborator Author

@juanitorduz looks great. Could you just install from conda to get rid of those ugly C-API warnings?

I cleaned the outputs :)

Copy link

review-notebook-app bot commented Mar 21, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-03-21T13:50:22Z
----------------------------------------------------------------

Typo; change to (no longer maintained).

Not sure it's worth mentioning, but lifetimes has not had a meaningful update since Jul 2020.


juanitorduz commented on 2024-04-03T07:04:27Z
----------------------------------------------------------------

Thanks! Added

Copy link

review-notebook-app bot commented Mar 21, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-03-21T13:50:22Z
----------------------------------------------------------------

Line #5.    from lifetimes.datasets import load_cdnow_summary_data_with_monetary_value

Remove this and refer to comment in cell where this function is used.


juanitorduz commented on 2024-04-03T07:09:33Z
----------------------------------------------------------------

I think we are "forced" by the ruff notebook style guidelines to have all the imports on top

Copy link

review-notebook-app bot commented Mar 21, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-03-21T13:50:23Z
----------------------------------------------------------------

Line #9.    az.style.use("arviz-darkgrid")

Maybe add a comment that lines 9-16 pertain to the plotting config for the notebook? Optional, but I like to comment code blocks like this where the purpose isn't immediately apparent in a variable name.


juanitorduz commented on 2024-04-03T07:14:34Z
----------------------------------------------------------------

Good point! I will add it to the other notebooks as well.

Copy link

review-notebook-app bot commented Mar 21, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-03-21T13:50:24Z
----------------------------------------------------------------

Line #1.    summary_with_money_value = load_cdnow_summary_data_with_monetary_value()

I'd prefer to keep the legacy lifetimes stuff restricted to model comparison.

The same dataset is loaded from the repo in the Quickstart notebook:

summary_with_money_value = pd.read_csv("https://raw.githubusercontent.com/pymc-labs/pymc-marketing/main/datasets/cdnow_transactions.csv")


juanitorduz commented on 2024-04-03T07:19:46Z
----------------------------------------------------------------

Got it! Done!

@ColtAllen
Copy link
Collaborator

Looking good so far. In addition to minor remarks below, it may be worth adding a practical example to illustrate (#594). A synthetic monetary value column with crazy huge values like this may do the trick:

shape, scale = 1., 10000000. 
s = np.random.gamma(shape, scale, 1000)

Copy link

review-notebook-app bot commented Mar 21, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-03-21T14:05:59Z
----------------------------------------------------------------

Line #1.    ggf = GammaGammaFitter(penalizer_coef=0)

Let's remove the penalizer_coeff parametrization - it has a default value of 0 regardless, and I never liked using it in lifetimes because specifying anything other than 0 creates weird outputs since it applies an L2 parameter shrinkage akin to ridge regression. It's not an appropriate regularization method for a model as non-linear as this one.

. . . Actually, I take that back. Maybe we can highlight the flaws of this as an lead-in to using priors for regularization instead?


juanitorduz commented on 2024-04-03T07:20:44Z
----------------------------------------------------------------

No, I suggest we leave it out and focus on our model. As you mentioned, we can reduce mentioning the lifetimes package.

@ColtAllen
Copy link
Collaborator

Also, it's an important modeling assumption that frequency and monetary value be uncorrelated (less than 0.3). Add this to a cell before the modeling section:

returning_customers_summary[['monetary_value', 'frequency']].corr()

Copy link
Collaborator Author

Thanks! Added


View entire conversation on ReviewNB

Copy link
Collaborator Author

I think we are "forced" by the ruff notebook style guidelines to have all the imports on top


View entire conversation on ReviewNB

Copy link
Collaborator Author

Good point! I will add it to the other notebooks as well.


View entire conversation on ReviewNB

Copy link
Collaborator Author

Got it! Done!


View entire conversation on ReviewNB

Copy link
Collaborator Author

No, I suggest we leave it out and focus on our model. As you mentioned, we can reduce mentioning the lifetimes package.


View entire conversation on ReviewNB

Copy link

review-notebook-app bot commented Apr 4, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-04-04T17:34:28Z
----------------------------------------------------------------

The CDNOW dataset line seems redundant as it's also mentioned in the Read Data section.


juanitorduz commented on 2024-04-04T19:03:35Z
----------------------------------------------------------------

Right! I removed it!

Copy link

review-notebook-app bot commented Apr 4, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-04-04T17:34:28Z
----------------------------------------------------------------

If you re-run this cell, it should clear the NumPy Warning, which may raise questions.


juanitorduz commented on 2024-04-04T19:21:02Z
----------------------------------------------------------------

I actually do not know where they are coming from. Even if I run everything in conda I still get them :(

ColtAllen commented on 2024-04-04T21:47:25Z
----------------------------------------------------------------

A clunky workaround would be to add some code suppressing the warnings, then deleting said code prior to pushing to GitHub.

Copy link

review-notebook-app bot commented Apr 4, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-04-04T17:34:29Z
----------------------------------------------------------------

This is important, but does it need to be in a headline font like this?


juanitorduz commented on 2024-04-04T19:03:14Z
----------------------------------------------------------------

hehe I do not why is it displayed like this but it looks ok on the render docs XD

Copy link

review-notebook-app bot commented Apr 4, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-04-04T17:34:30Z
----------------------------------------------------------------

Purely optional, but this and the related code cells can be moved to the prior section.


Copy link

review-notebook-app bot commented Apr 4, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-04-04T17:34:31Z
----------------------------------------------------------------

If you wish to mention the specific optimizer, it's L-BFGS-B. There is also a typo in "optimizer".


juanitorduz commented on 2024-04-04T19:06:11Z
----------------------------------------------------------------

Good catch! Thanks!

Copy link

review-notebook-app bot commented Apr 4, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-04-04T17:34:32Z
----------------------------------------------------------------

I believe it's worth adding some remarks mentioning how the choice of prior will impact the parameter estimates. Up to you if you also want to add a code example.


juanitorduz commented on 2024-04-04T19:19:28Z
----------------------------------------------------------------

Let me skip this for now as i wanna keep this relative simple. I agree is a great point to maybe we could have a prior sensitivity analysis notebook :)

Copy link
Collaborator Author

Good point! Added a small remark!


View entire conversation on ReviewNB

Copy link
Collaborator Author

Great idea!


View entire conversation on ReviewNB

Copy link
Collaborator Author

Let me skip this for now as i wanna keep this relative simple. I agree is a great point to maybe we could have a prior sensitivity analysis notebook :)


View entire conversation on ReviewNB

Copy link
Collaborator Author

I actually do not know where they are coming from. Even if I run everything in conda I still get them :(


View entire conversation on ReviewNB

Copy link
Collaborator Author

Changed the sampler params a bit and now it looks much better


View entire conversation on ReviewNB

@juanitorduz
Copy link
Collaborator Author

Thanks for the review, @ColtAllen. I think I addressed all of your comments (except the pytensor warnings I am trying to remove). Let me know what you think about this version :)

Copy link
Collaborator

A clunky workaround would be to add some code suppressing the warnings, then deleting said code prior to pushing to GitHub.


View entire conversation on ReviewNB

@ricardoV94
Copy link
Contributor

Are you struggling to install pytensor from conda? Shouldn't be getting warnings :)

@juanitorduz
Copy link
Collaborator Author

Are you struggling to install pytensor from conda? Shouldn't be getting warnings :)

I am using mamba because conda is too slow 🙈 . I will delete and re-create all my envs next week 🤞

@ricardoV94
Copy link
Contributor

mamba/conda is the same. I also use mamba. Still you shouldn't be getting those warnings. You may not be using the right environment in the NB?

Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few unresolved formatting and style remarks in ReviewNB, but they're quite minor. Overall I think this is good to merge. Great work!

@juanitorduz juanitorduz merged commit f3ec2cf into main Apr 5, 2024
10 checks passed
@juanitorduz juanitorduz deleted the improve_gamma_gamma_nb branch April 5, 2024 14:40
wd60622 pushed a commit that referenced this pull request Apr 8, 2024
* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually
louismagowan pushed a commit to louismagowan/pymc-marketing that referenced this pull request Apr 11, 2024
* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually
twiecki pushed a commit that referenced this pull request May 21, 2024
* feat(streamlit_explainer): Pushing files for Streamlit explainer app, to illustrate saturation, adstock and prior concepts in an intuitive, visual way to stakeholders and new MMMers

* chore(readme): Adding a readme for the app

* fix(env): Updating dependencies to include those needed for the Streamlit app

* Drop python 3.9 support (#615)

* drop python 3.9

* try python 3.12

* undo try python 3.12

* add lift tests check

* Add more content to the Gamma-Gamma Notebook  (#573)

* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually

* Add more content to the BG/NBD Notebook (#571)

* add more info to the notebook

* hide plots code

* fix plot y labels

* fix plot outputs and remove model build

* improve final note probability plots

* address comments

* use quickstart dataset

* feedback part 3

* remowe warnings manually

* feedback part 4

* Improve MMM Docs (#612)

* improve mmm docs init

* add more code examples to docstrings

* minor improvemeents

* typo

* better phrasing

* add thomas suggestion

* Fix `clv` plotting bugs and edits to Quickstart (#601)

* move fixtures to conftest

* docstrings and moved set_model_fit to conftest

* fixed pandas quickstart warnings

* revert to MockModel and add ParetoNBD support

* quickstart edit for issue 609

* notebook edit

* [pre-commit.ci] pre-commit autoupdate (#616)

* improve coords matching (#623)

* python 3.12 attempt (#618)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(saturation): Using pymc-marketing saturation functions rather than coding my own: Removing tanh, logistic and michaelis menten

* refactor(saturation): Remove Hill and Root saturations, as they aren't supported by pymc-marketing currently

* refactor(geometric_adstock): Removing custom adstock and using pymc-marketing adstock function to demo decay. Also updating latex to align with pymc-marketing, where decay factor is represented by alpha rather than beta

* refactor(delayed_adstock): Using pymc-marketing delayed geometric function rather than custom one

* fix(requirements): Adding pymc-marketing to Streamlit requirements for deployment

* Added Dev Container Folder

* refactor(weibull_cdf): Using pymc-marketing function for Weibull CDF

* fix(weibull_cdf): Fixing incorrect dataframe var name for CDF plotting df

* refactor(weibull_pdf): Using pymc-marketing function for WeibullPDF

* refactor(custom_functions): Removing adstock_saturation_functions.py file now that it is no longer required

* chore: Removing devcontainer created by Streamlit

* fix(requirements): Adding preliz to requirements

* refactor(prior_viz): Reworking the prior visualisation to use Preliz instead of custom function, as well as remove the tab-design. Prior distributions can now be specified programmatically.

* refactor(prior_functions.py): Deleting the draw_samples function and replacing it with a programmatic PreliZ function, such that the distribution object is returned when the user passes in the name of a distribution

* fix(requirements): Delete obsolete pymc requirement, which should fix deployment dependency conflicts

* chore(readme): Updating with guidelines on how to add additional distributions or transformation functions to the app

* refactor(plot_config): Moving height and width specifications into constants at top of Adstock and Saturation files, so the plot sizes are set programmatically

---------

Co-authored-by: Juan Orduz <[email protected]>
Co-authored-by: Colt Allen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Trujillo <[email protected]>
twiecki pushed a commit that referenced this pull request Sep 10, 2024
* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually
twiecki pushed a commit that referenced this pull request Sep 10, 2024
* feat(streamlit_explainer): Pushing files for Streamlit explainer app, to illustrate saturation, adstock and prior concepts in an intuitive, visual way to stakeholders and new MMMers

* chore(readme): Adding a readme for the app

* fix(env): Updating dependencies to include those needed for the Streamlit app

* Drop python 3.9 support (#615)

* drop python 3.9

* try python 3.12

* undo try python 3.12

* add lift tests check

* Add more content to the Gamma-Gamma Notebook  (#573)

* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually

* Add more content to the BG/NBD Notebook (#571)

* add more info to the notebook

* hide plots code

* fix plot y labels

* fix plot outputs and remove model build

* improve final note probability plots

* address comments

* use quickstart dataset

* feedback part 3

* remowe warnings manually

* feedback part 4

* Improve MMM Docs (#612)

* improve mmm docs init

* add more code examples to docstrings

* minor improvemeents

* typo

* better phrasing

* add thomas suggestion

* Fix `clv` plotting bugs and edits to Quickstart (#601)

* move fixtures to conftest

* docstrings and moved set_model_fit to conftest

* fixed pandas quickstart warnings

* revert to MockModel and add ParetoNBD support

* quickstart edit for issue 609

* notebook edit

* [pre-commit.ci] pre-commit autoupdate (#616)

* improve coords matching (#623)

* python 3.12 attempt (#618)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(saturation): Using pymc-marketing saturation functions rather than coding my own: Removing tanh, logistic and michaelis menten

* refactor(saturation): Remove Hill and Root saturations, as they aren't supported by pymc-marketing currently

* refactor(geometric_adstock): Removing custom adstock and using pymc-marketing adstock function to demo decay. Also updating latex to align with pymc-marketing, where decay factor is represented by alpha rather than beta

* refactor(delayed_adstock): Using pymc-marketing delayed geometric function rather than custom one

* fix(requirements): Adding pymc-marketing to Streamlit requirements for deployment

* Added Dev Container Folder

* refactor(weibull_cdf): Using pymc-marketing function for Weibull CDF

* fix(weibull_cdf): Fixing incorrect dataframe var name for CDF plotting df

* refactor(weibull_pdf): Using pymc-marketing function for WeibullPDF

* refactor(custom_functions): Removing adstock_saturation_functions.py file now that it is no longer required

* chore: Removing devcontainer created by Streamlit

* fix(requirements): Adding preliz to requirements

* refactor(prior_viz): Reworking the prior visualisation to use Preliz instead of custom function, as well as remove the tab-design. Prior distributions can now be specified programmatically.

* refactor(prior_functions.py): Deleting the draw_samples function and replacing it with a programmatic PreliZ function, such that the distribution object is returned when the user passes in the name of a distribution

* fix(requirements): Delete obsolete pymc requirement, which should fix deployment dependency conflicts

* chore(readme): Updating with guidelines on how to add additional distributions or transformation functions to the app

* refactor(plot_config): Moving height and width specifications into constants at top of Adstock and Saturation files, so the plot sizes are set programmatically

---------

Co-authored-by: Juan Orduz <[email protected]>
Co-authored-by: Colt Allen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Trujillo <[email protected]>
twiecki pushed a commit that referenced this pull request Sep 10, 2024
* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually
twiecki pushed a commit that referenced this pull request Sep 10, 2024
* feat(streamlit_explainer): Pushing files for Streamlit explainer app, to illustrate saturation, adstock and prior concepts in an intuitive, visual way to stakeholders and new MMMers

* chore(readme): Adding a readme for the app

* fix(env): Updating dependencies to include those needed for the Streamlit app

* Drop python 3.9 support (#615)

* drop python 3.9

* try python 3.12

* undo try python 3.12

* add lift tests check

* Add more content to the Gamma-Gamma Notebook  (#573)

* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually

* Add more content to the BG/NBD Notebook (#571)

* add more info to the notebook

* hide plots code

* fix plot y labels

* fix plot outputs and remove model build

* improve final note probability plots

* address comments

* use quickstart dataset

* feedback part 3

* remowe warnings manually

* feedback part 4

* Improve MMM Docs (#612)

* improve mmm docs init

* add more code examples to docstrings

* minor improvemeents

* typo

* better phrasing

* add thomas suggestion

* Fix `clv` plotting bugs and edits to Quickstart (#601)

* move fixtures to conftest

* docstrings and moved set_model_fit to conftest

* fixed pandas quickstart warnings

* revert to MockModel and add ParetoNBD support

* quickstart edit for issue 609

* notebook edit

* [pre-commit.ci] pre-commit autoupdate (#616)

* improve coords matching (#623)

* python 3.12 attempt (#618)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(saturation): Using pymc-marketing saturation functions rather than coding my own: Removing tanh, logistic and michaelis menten

* refactor(saturation): Remove Hill and Root saturations, as they aren't supported by pymc-marketing currently

* refactor(geometric_adstock): Removing custom adstock and using pymc-marketing adstock function to demo decay. Also updating latex to align with pymc-marketing, where decay factor is represented by alpha rather than beta

* refactor(delayed_adstock): Using pymc-marketing delayed geometric function rather than custom one

* fix(requirements): Adding pymc-marketing to Streamlit requirements for deployment

* Added Dev Container Folder

* refactor(weibull_cdf): Using pymc-marketing function for Weibull CDF

* fix(weibull_cdf): Fixing incorrect dataframe var name for CDF plotting df

* refactor(weibull_pdf): Using pymc-marketing function for WeibullPDF

* refactor(custom_functions): Removing adstock_saturation_functions.py file now that it is no longer required

* chore: Removing devcontainer created by Streamlit

* fix(requirements): Adding preliz to requirements

* refactor(prior_viz): Reworking the prior visualisation to use Preliz instead of custom function, as well as remove the tab-design. Prior distributions can now be specified programmatically.

* refactor(prior_functions.py): Deleting the draw_samples function and replacing it with a programmatic PreliZ function, such that the distribution object is returned when the user passes in the name of a distribution

* fix(requirements): Delete obsolete pymc requirement, which should fix deployment dependency conflicts

* chore(readme): Updating with guidelines on how to add additional distributions or transformation functions to the app

* refactor(plot_config): Moving height and width specifications into constants at top of Adstock and Saturation files, so the plot sizes are set programmatically

---------

Co-authored-by: Juan Orduz <[email protected]>
Co-authored-by: Colt Allen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Trujillo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants