Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Covariates to ParetoNBDModel #463

Closed
wants to merge 28 commits into from

Conversation

ColtAllen
Copy link
Collaborator

@ColtAllen ColtAllen commented Dec 13, 2023

Description

This PR provides the option of using time-invariant covariates with ParetoNBDModel. This is a draft PR for now because I still need to parametrize some tests and add a few other things, but more importantly get opinions on two points:

Covariates require scaling for model convergence, preferably with MinMaxScaler so the coefficients are interpretable. This can easily be added as a utility function, and the covariate Min/Max values saved as model.idata attributes for out-of-sample predictions. Should a scaler be added in this PR, or a future one? Should it be added to ParetoNBDModel , or the base CLVModel?

Covariates for existing customers will also require modifications to the "new customer" methods. There are three options for this:

  1. Just raise a ValueError indicating these methods aren't supported with covariates (easy, but not my preference.)
  2. Average all the customer-specific alpha and beta parameters together for new customer estimates, which is a subset of the next option.
  3. Add arguments to these methods for user-specified covariate values. This will be involved and maybe even clunky to use in practice, but if a user wants to do sensitivity analysis on how covariates can impact new customer estimates, this is the way to go.

Regardless of the approach, should it added to this PR, or another one? This one is already rather lengthy.

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--463.org.readthedocs.build/en/463/

@ColtAllen ColtAllen added enhancement New feature or request CLV request discussion major API breaking changes labels Dec 13, 2023
@ColtAllen ColtAllen self-assigned this Dec 13, 2023
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (a224cfe) 91.36% compared to head (5546f1f) 90.07%.

❗ Current head 5546f1f differs from pull request most recent head a63b0bd. Consider uploading reports for the commit a63b0bd to get more accurate results

Files Patch % Lines
pymc_marketing/clv/models/pareto_nbd.py 80.00% 20 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #463      +/-   ##
==========================================
- Coverage   91.36%   90.07%   -1.30%     
==========================================
  Files          21       21              
  Lines        2073     2025      -48     
==========================================
- Hits         1894     1824      -70     
- Misses        179      201      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ColtAllen
Copy link
Collaborator Author

Tests are all passing, albeit incomplete. The CI seems to have timed out on an MMM test.

@twiecki
Copy link
Contributor

twiecki commented Jan 8, 2024

Ready for review?

@twiecki twiecki marked this pull request as ready for review January 8, 2024 07:17
@ColtAllen
Copy link
Collaborator Author

ColtAllen commented Jan 8, 2024

Ready for review?

I'm gonna save the covariate scaling and new customer modifications for future PRs. Not all tests are parametrized yet for covariates, and predictive methods will fail for out-of-sample predictions.

@ColtAllen
Copy link
Collaborator Author

@ricardoV94 I could use your help with an xarray dimensional error in the expected_purchases and expected_probability_alive methods:

 ValueError: cannot reindex or align along dimension 'customer_id' 
because of conflicting dimension sizes: {2356, 2357} 
(note: an index is found along that dimension with size=2356)

Otherwise, this is ready for review.

"""
Utility function to process covariates into model parameters.
"""
# TODO: broadcasting needs to be resolved for out-of-sample data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add a Warning to handle this TODO in case is not part of the PR?

@juanitorduz
Copy link
Collaborator

@ColtAllen This is very cool! I need (and will!) to catch up with the development! I left some basic comments on the PR. I hope I can contribute more once I read the model details 🙏

@ricardoV94 ricardoV94 removed the major API breaking changes label Jan 18, 2024
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
@@ -265,6 +356,8 @@ def test_expected_probability_alive(self):
est_prob_alive_t = self.model.expected_probability_alive(future_t=4.5)
assert est_prob_alive.mean() > est_prob_alive_t.mean()

self.covar_model.expected_probability_alive()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test something about this. Doesn't the same question of covariates show up here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xarray issue needs to be resolved first (see my latest comment).

@@ -290,6 +383,8 @@ def test_expected_purchase_probability(self, test_n, test_t):
rtol=0.001,
)

self.model.expected_purchase_probability(test_n, test_t, self.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean self.covar_model? Also need to test something about the output

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xarray issue needs to be resolved first (see my latest comment).

tests/clv/models/test_pareto_nbd.py Show resolved Hide resolved
@ricardoV94
Copy link
Contributor

  • Just raise a ValueError indicating these methods aren't supported with covariates (easy, but not my preference.)

  • Average all the customer-specific alpha and beta parameters together for new customer estimates, which is a subset of the next option.

  • Add arguments to these methods for user-specified covariate values. This will be involved and maybe even clunky to use in practice, but if a user wants to do sensitivity analysis on how covariates can impact new customer estimates, this is the way to go.

I don't like option 2. I would say 1 if you want to get this merged quickly, and 3 otherwise (or later)

@ricardoV94
Copy link
Contributor

@ricardoV94 I could use your help with an xarray dimensional error in the expected_purchases and expected_probability_alive methods:

 ValueError: cannot reindex or align along dimension 'customer_id' 
because of conflicting dimension sizes: {2356, 2357} 
(note: an index is found along that dimension with size=2356)

Otherwise, this is ready for review.

What code can I look at that runs this error? The dev notebook/ a specific test?

@ColtAllen
Copy link
Collaborator Author

 ValueError: cannot reindex or align along dimension 'customer_id' 
because of conflicting dimension sizes: {2356, 2357} 
(note: an index is found along that dimension with size=2356)

What code can I look at that runs this error? The dev notebook/ a specific test?

Tests are failing at these lines:
https://github.com/ColtAllen/pymc-marketing/blob/pareto_covar/pymc_marketing/clv/models/pareto_nbd.py#L687

https://github.com/ColtAllen/pymc-marketing/blob/pareto_covar/pymc_marketing/clv/models/pareto_nbd.py#L639

https://github.com/ColtAllen/pymc-marketing/blob/pareto_covar/pymc_marketing/clv/models/pareto_nbd.py#L554

@ColtAllen
Copy link
Collaborator Author

I don't know how important this is to resolve, but PyCharm is highlighting this inconsistency:

Signature of method 'ParetoNBDModel.build_model()' does not match signature of the base method in class 'ModelBuilder' 
 Inspection info:
Reports inconsistencies in overriding method signatures.
Example:
class Book:
    def add_title(self):
        pass


class Novel(Book):
    def add_title(self, text):
        pass
Parameters of the add_title method in the Novel class do not match the method signature specified in the Book class. As a fix, the IDE offers to apply the Change Signature refactoring.

@ColtAllen ColtAllen mentioned this pull request Feb 4, 2024
11 tasks
@ricardoV94 ricardoV94 mentioned this pull request Feb 22, 2024
13 tasks
@ColtAllen ColtAllen deleted the pareto_covar branch June 1, 2024 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants