-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add posterior population sampling methods to ParetoNBD
#401
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report
@@ Coverage Diff @@
## main #401 +/- ##
==========================================
+ Coverage 88.71% 88.74% +0.03%
==========================================
Files 21 21
Lines 1869 1884 +15
==========================================
+ Hits 1658 1672 +14
- Misses 211 212 +1
|
@@ -632,17 +646,23 @@ def _distribution_new_customers( | |||
s = pm.HalfFlat("s") | |||
beta = pm.HalfFlat("beta") | |||
|
|||
# This is the shape if using fit_method="map" | |||
if self.fit_result.dims == {"chain": 1, "draw": 1}: | |||
if shape_kwargs is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is shape_kwargs
even needed in the first place? I didn't notice it being needed in any other CLV methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly, I rewrote this because that IF statement was preventing 100% testing coverage. I don't see much use for it other than adjusting the shape
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the point here was to take more than one single draw when we have a map
fit. But other than that we shouldn't have to control the shape at all?
If you want to trigger this line in the tests calling the method after a fit(method="map")
(or add a corresponding fake fit) result should do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use some help to fix this: the mock idata object for amcmc
fit isn't triggering the IF statement in testing, so it has the same dims as amap
fit. It seems to work fine when tested in the notebook though.
Also, distribution_customer_population
has an additional dimension due to the T
parameter. I've added an argument to make this adjustable, but it may be more trouble than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the dims of the fake map fit? You should have chains=1, draws=1. Perhaps you have too put one extra bracket around the parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the issue - the idata object wasn't resetting between the mcmc
and map
testing parametrizations.
I had to increase the rtol
to get the tests to pass. This makes sense for the mcmc
fits because there are only 100 samples, but I'm not sure why it also had to be increased for map
since nothing otherwise was changed except my dev env's version of numpy
.
ping @ricardoV94 |
ParetoNBD
Posterior Population Sampling and Misc ChangesParetoNBD
ParetoNBD( | ||
name="customer_population", | ||
r=r, | ||
alpha=alpha, | ||
s=s, | ||
beta=beta, | ||
T=T, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ColtAllen we got this merged already but perhaps we can improve in a future PR. I am thinking of splitting the recency and number of purchases into two separate variables instead of having the current vector (n, 2)? Something like
pnbd = ParetoNBD(
name="customer_behavior",
r=r,
alpha=alpha,
s=s,
beta=beta,
T=T,
)
pm.Deterministic("customer_recency", pnbd[..., 0])
pm.Deterministic("customer_num_purchases", pnbd[..., 1])
Would also be good to add dims customer_id
to these variables. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened issue in #417
This PR addresses #319, #318, and #278 for
ParetoNBDModel
.The most significant addition is the
distribution_customer_population
method, which can be used to plot the distribution of estimated customer purchase frequencies:I also performed some maintenance like removing the "Experimental Status" user warning, and cleaning up the unit tests and dev notebook.
build_model()
is now auto-called internally, andParetoNBDModel.fit()
also returnsself
, which is aligned with thesklearn
convention.I experimented with adding Slice Sampling for this PR. Although it is 3-4x faster than NUTS for this particular model, the performance was not as reliable as I'd hoped. In the future I'll create
pytensor
PRs enabling support for external samplers.📚 Documentation preview 📚: https://pymc-marketing--401.org.readthedocs.build/en/401/