Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add posterior population sampling methods to ParetoNBD #401

Merged
merged 13 commits into from
Nov 1, 2023

Conversation

ColtAllen
Copy link
Collaborator

@ColtAllen ColtAllen commented Oct 20, 2023

This PR addresses #319, #318, and #278 for ParetoNBDModel.

The most significant addition is the distribution_customer_population method, which can be used to plot the distribution of estimated customer purchase frequencies:

image

I also performed some maintenance like removing the "Experimental Status" user warning, and cleaning up the unit tests and dev notebook. build_model() is now auto-called internally, and ParetoNBDModel.fit() also returns self, which is aligned with the sklearn convention.

I experimented with adding Slice Sampling for this PR. Although it is 3-4x faster than NUTS for this particular model, the performance was not as reliable as I'd hoped. In the future I'll create pytensor PRs enabling support for external samplers.


📚 Documentation preview 📚: https://pymc-marketing--401.org.readthedocs.build/en/401/

@ColtAllen ColtAllen added enhancement New feature or request CLV maintenance labels Oct 20, 2023
@ColtAllen ColtAllen self-assigned this Oct 20, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Merging #401 (d51c814) into main (32098cc) will increase coverage by 0.03%.
Report is 2 commits behind head on main.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #401      +/-   ##
==========================================
+ Coverage   88.71%   88.74%   +0.03%     
==========================================
  Files          21       21              
  Lines        1869     1884      +15     
==========================================
+ Hits         1658     1672      +14     
- Misses        211      212       +1     
Files Coverage Δ
pymc_marketing/clv/models/pareto_nbd.py 94.08% <100.00%> (+0.74%) ⬆️

... and 1 file with indirect coverage changes

pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
@@ -632,17 +646,23 @@ def _distribution_new_customers(
s = pm.HalfFlat("s")
beta = pm.HalfFlat("beta")

# This is the shape if using fit_method="map"
if self.fit_result.dims == {"chain": 1, "draw": 1}:
if shape_kwargs is None:
Copy link
Contributor

@ricardoV94 ricardoV94 Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is shape_kwargs even needed in the first place? I didn't notice it being needed in any other CLV methods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, I rewrote this because that IF statement was preventing 100% testing coverage. I don't see much use for it other than adjusting the shape parameter.

Copy link
Contributor

@ricardoV94 ricardoV94 Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the point here was to take more than one single draw when we have a map fit. But other than that we shouldn't have to control the shape at all?

If you want to trigger this line in the tests calling the method after a fit(method="map") (or add a corresponding fake fit) result should do it.

Copy link
Collaborator Author

@ColtAllen ColtAllen Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use some help to fix this: the mock idata object for amcmc fit isn't triggering the IF statement in testing, so it has the same dims as amap fit. It seems to work fine when tested in the notebook though.

Also, distribution_customer_population has an additional dimension due to the T parameter. I've added an argument to make this adjustable, but it may be more trouble than it's worth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the dims of the fake map fit? You should have chains=1, draws=1. Perhaps you have too put one extra bracket around the parameters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the issue - the idata object wasn't resetting between the mcmc and map testing parametrizations.

I had to increase the rtol to get the tests to pass. This makes sense for the mcmc fits because there are only 100 samples, but I'm not sure why it also had to be increased for map since nothing otherwise was changed except my dev env's version of numpy.

@twiecki
Copy link
Contributor

twiecki commented Oct 31, 2023

ping @ricardoV94

@ricardoV94 ricardoV94 changed the title ParetoNBD Posterior Population Sampling and Misc Changes Add posterior population sampling methods to ParetoNBD Nov 1, 2023
@twiecki twiecki merged commit 7eb2ecb into pymc-labs:main Nov 1, 2023
12 checks passed
Comment on lines +663 to +670
ParetoNBD(
name="customer_population",
r=r,
alpha=alpha,
s=s,
beta=beta,
T=T,
)
Copy link
Contributor

@ricardoV94 ricardoV94 Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ColtAllen we got this merged already but perhaps we can improve in a future PR. I am thinking of splitting the recency and number of purchases into two separate variables instead of having the current vector (n, 2)? Something like

pnbd = ParetoNBD(
    name="customer_behavior",
    r=r,
    alpha=alpha,
    s=s,
    beta=beta,
    T=T,
)

pm.Deterministic("customer_recency", pnbd[..., 0])
pm.Deterministic("customer_num_purchases", pnbd[..., 1])

Would also be good to add dims customer_id to these variables. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened issue in #417

@ColtAllen ColtAllen deleted the pareto_slice_sampler branch November 7, 2023 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV enhancement New feature or request maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants