Add posterior population sampling methods to `ParetoNBD` #401

ColtAllen · 2023-10-20T17:38:29Z

This PR addresses #319, #318, and #278 for ParetoNBDModel.

The most significant addition is the distribution_customer_population method, which can be used to plot the distribution of estimated customer purchase frequencies:

I also performed some maintenance like removing the "Experimental Status" user warning, and cleaning up the unit tests and dev notebook. build_model() is now auto-called internally, and ParetoNBDModel.fit() also returns self, which is aligned with the sklearn convention.

I experimented with adding Slice Sampling for this PR. Although it is 3-4x faster than NUTS for this particular model, the performance was not as reliable as I'd hoped. In the future I'll create pytensor PRs enabling support for external samplers.

📚 Documentation preview 📚: https://pymc-marketing--401.org.readthedocs.build/en/401/

review-notebook-app · 2023-10-20T17:38:34Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2023-10-20T18:15:35Z

Codecov Report

Merging #401 (d51c814) into main (32098cc) will increase coverage by 0.03%.
Report is 2 commits behind head on main.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #401      +/-   ##
==========================================
+ Coverage   88.71%   88.74%   +0.03%     
==========================================
  Files          21       21              
  Lines        1869     1884      +15     
==========================================
+ Hits         1658     1672      +14     
- Misses        211      212       +1

Files	Coverage Δ
pymc_marketing/clv/models/pareto_nbd.py	`94.08% <100.00%> (+0.74%)`	⬆️

... and 1 file with indirect coverage changes

pymc_marketing/clv/models/pareto_nbd.py

ricardoV94 · 2023-10-23T14:54:12Z

pymc_marketing/clv/models/pareto_nbd.py

@@ -632,17 +646,23 @@ def _distribution_new_customers(
            s = pm.HalfFlat("s")
            beta = pm.HalfFlat("beta")

-            # This is the shape if using fit_method="map"
-            if self.fit_result.dims == {"chain": 1, "draw": 1}:
+            if shape_kwargs is None:


Why is shape_kwargs even needed in the first place? I didn't notice it being needed in any other CLV methods

Admittedly, I rewrote this because that IF statement was preventing 100% testing coverage. I don't see much use for it other than adjusting the shape parameter.

I guess the point here was to take more than one single draw when we have a map fit. But other than that we shouldn't have to control the shape at all?

If you want to trigger this line in the tests calling the method after a fit(method="map") (or add a corresponding fake fit) result should do it.

I could use some help to fix this: the mock idata object for amcmc fit isn't triggering the IF statement in testing, so it has the same dims as amap fit. It seems to work fine when tested in the notebook though.

Also, distribution_customer_population has an additional dimension due to the T parameter. I've added an argument to make this adjustable, but it may be more trouble than it's worth.

What are the dims of the fake map fit? You should have chains=1, draws=1. Perhaps you have too put one extra bracket around the parameters

Found the issue - the idata object wasn't resetting between the mcmc and map testing parametrizations.

I had to increase the rtol to get the tests to pass. This makes sense for the mcmc fits because there are only 100 samples, but I'm not sure why it also had to be increased for map since nothing otherwise was changed except my dev env's version of numpy.

pymc_marketing/clv/models/pareto_nbd.py

tests/clv/models/test_pareto_nbd.py

twiecki · 2023-10-31T16:11:07Z

ping @ricardoV94

ricardoV94 · 2023-11-01T11:52:47Z

pymc_marketing/clv/models/pareto_nbd.py

+            ParetoNBD(
+                name="customer_population",
+                r=r,
+                alpha=alpha,
+                s=s,
+                beta=beta,
+                T=T,
+            )


Hey @ColtAllen we got this merged already but perhaps we can improve in a future PR. I am thinking of splitting the recency and number of purchases into two separate variables instead of having the current vector (n, 2)? Something like

pnbd = ParetoNBD( name="customer_behavior", r=r, alpha=alpha, s=s, beta=beta, T=T, ) pm.Deterministic("customer_recency", pnbd[..., 0]) pm.Deterministic("customer_num_purchases", pnbd[..., 1])

Would also be good to add dims customer_id to these variables. WDYT?

Opened issue in #417

ColtAllen added 10 commits October 18, 2023 18:14

build_model call in subclass __init__

cc68e8b

clean up notebook

935badf

added slice sampling, map default fit_method

5f43fb8

Removed experimental warning

5d82866

cleaned up tests

302b201

added distribution_customer_population

971fd6a

docstrings and unit tests

3c9e16a

remove slice sampler, fit returns self

03fdcb3

remove slice sampler, fit returns self

c8e9fa1

docstrings

6c5af0b

ColtAllen added enhancement New feature or request CLV maintenance labels Oct 20, 2023

ColtAllen requested a review from ricardoV94 October 20, 2023 17:38

ColtAllen self-assigned this Oct 20, 2023

ricardoV94 requested changes Oct 23, 2023

View reviewed changes

Revert sample_kwargs, build_model, and fit

ef45da2

ricardoV94 reviewed Oct 24, 2023

View reviewed changes

pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Oct 24, 2023

View reviewed changes

tests/clv/models/test_pareto_nbd.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Oct 24, 2023

View reviewed changes

tests/clv/models/test_pareto_nbd.py Outdated Show resolved Hide resolved

ColtAllen added 2 commits October 24, 2023 12:45

WIP test_posterior_distributions

d2b911f

fixed posterior tests

d51c814

ricardoV94 approved these changes Nov 1, 2023

View reviewed changes

ricardoV94 changed the title ~~ParetoNBD Posterior Population Sampling and Misc Changes~~ Add posterior population sampling methods to ParetoNBD Nov 1, 2023

twiecki merged commit 7eb2ecb into pymc-labs:main Nov 1, 2023
12 checks passed

ricardoV94 reviewed Nov 1, 2023

View reviewed changes

ColtAllen deleted the pareto_slice_sampler branch November 7, 2023 03:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add posterior population sampling methods to `ParetoNBD` #401

Add posterior population sampling methods to `ParetoNBD` #401

ColtAllen commented Oct 20, 2023 •

edited by github-actions bot

Loading

review-notebook-app bot commented Oct 20, 2023

codecov bot commented Oct 20, 2023 •

edited

Loading

ricardoV94 Oct 23, 2023 •

edited

Loading

ColtAllen Oct 23, 2023

ricardoV94 Oct 23, 2023 •

edited

Loading

ColtAllen Oct 24, 2023 •

edited

Loading

ricardoV94 Oct 25, 2023

ColtAllen Oct 25, 2023

twiecki commented Oct 31, 2023

ricardoV94 Nov 1, 2023 •

edited

Loading

ricardoV94 Nov 1, 2023

Add posterior population sampling methods to ParetoNBD #401

Add posterior population sampling methods to ParetoNBD #401

Conversation

ColtAllen commented Oct 20, 2023 • edited by github-actions bot Loading

review-notebook-app bot commented Oct 20, 2023

codecov bot commented Oct 20, 2023 • edited Loading

Codecov Report

ricardoV94 Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

ColtAllen Oct 23, 2023

Choose a reason for hiding this comment

ricardoV94 Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

ColtAllen Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Oct 25, 2023

Choose a reason for hiding this comment

ColtAllen Oct 25, 2023

Choose a reason for hiding this comment

twiecki commented Oct 31, 2023

ricardoV94 Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Nov 1, 2023

Choose a reason for hiding this comment

Add posterior population sampling methods to `ParetoNBD` #401

Add posterior population sampling methods to `ParetoNBD` #401

ColtAllen commented Oct 20, 2023 •

edited by github-actions bot

Loading

codecov bot commented Oct 20, 2023 •

edited

Loading

ricardoV94 Oct 23, 2023 •

edited

Loading

ricardoV94 Oct 23, 2023 •

edited

Loading

ColtAllen Oct 24, 2023 •

edited

Loading

ricardoV94 Nov 1, 2023 •

edited

Loading