Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support New Data for MMM model #444

Closed
wants to merge 15 commits into from

Conversation

wd60622
Copy link
Contributor

@wd60622 wd60622 commented Nov 24, 2023

Barebone implementation to support new data instead of the PR that removes the mixins

Mixins transformations from child classes can still be used if they exist down the line.

This PR introduces :

  • mutable coords for the date in order to support new data
  • transformations of channel, control, and date features (if exists) for new data
  • raise error in data setter method if not pandas DataFrame since columns various columns need to be accessed
  • tests for the sample_posterior_predictive method and check for failure of predict_posterior method as design matrix has date type. To enable this, there would have to be relaxing of ModelBuilder validation or override the method

📚 Documentation preview 📚: https://pymc-marketing--444.org.readthedocs.build/en/444/

Comment on lines +491 to +494
if not isinstance(X, pd.DataFrame):
raise TypeError(
"X must be a pandas DataFrame in order to access the columns"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm viewing this as required in order to access the columns. Changing the type hint makes mypy mad

If there are any other suggestions, let me know

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 99% of the cases people using this package use pandas, so is ok for me.

@wd60622
Copy link
Contributor Author

wd60622 commented Nov 24, 2023

Failing test for the change in behavior of _data_setter. No longer supports np.ndarray as input with different

EDIT: tests are updated to pass

Copy link

codecov bot commented Nov 24, 2023

Codecov Report

Attention: Patch coverage is 88.57143% with 4 lines in your changes missing coverage. Please review.

Project coverage is 90.67%. Comparing base (890c469) to head (ba62ce1).
Report is 253 commits behind head on main.

Files with missing lines Patch % Lines
pymc_marketing/mmm/delayed_saturated_mmm.py 88.57% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #444      +/-   ##
==========================================
- Coverage   90.82%   90.67%   -0.15%     
==========================================
  Files          21       21              
  Lines        1972     1994      +22     
==========================================
+ Hits         1791     1808      +17     
- Misses        181      186       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +500 to +501
except KeyError as e:
raise RuntimeError("New data must contain channel_data!", e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the error raises by any missing keys of the dataframe? For instance, date_column or control_columns

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so! I suggest that, for this iteration, we keep this strict and then wait for feedback.

@wd60622
Copy link
Contributor Author

wd60622 commented Dec 2, 2023

Will rework the tests to include:

  • old dates only
  • old and new dates
  • new dates only

But also the missing Fourier feautre

I believe there will be an issue if the new data doesn't have more than max_adstock rows too

Edit: all these checks complete

TypeError,
match=r"The DType <class 'numpy.dtype\[datetime64\]'> could not be promoted by",
):
mmm.predict_posterior(X_pred=new_X)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted in the comment, the ModelBuilder method will need an override if this is to work with dates in the input DataFrame

@cetagostini
Copy link
Contributor

cetagostini commented Dec 7, 2023

Hey mate, do you have a code snippet about how should work? What data should the data frame have to predict? Are you planning to add the example to the example notebook? (If you need, I can give you a hand with it) 🙌🏻

@wd60622
Copy link
Contributor Author

wd60622 commented Dec 9, 2023

Hey mate, do you have a code snippet about how should work? What data should the data frame to predict? Are you planning to add the example to the example notebook? (If you need, I can give you a hand with it) 🙌🏻

Good thinking. I will add some to the mmm_example.ipynb

@wd60622
Copy link
Contributor Author

wd60622 commented Dec 20, 2023

Had to do a force push with the rebases 🙏

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wd60622 This looks great! I left some comments regarding your questions! Just ping me when you would like a more detailed review.

This PR solves #450 right?

Also, will you add this extension to the MMM example notebook as part of this PR?

@wd60622
Copy link
Contributor Author

wd60622 commented Dec 28, 2023

@wd60622 This looks great! I left some comments regarding your questions! Just ping me when you would like a more detailed review.

This PR solves #450 right?

Yes, It will perform all the transformation on new data including scaling. The caveat is that the predictions are not in their original scale.

Also, will you add this extension to the MMM example notebook as part of this PR?

My current goal is to get all the predict methods sorted out, but I'm currently in rebase hell from the last PR

If it clears up, I will add a notebook too. But I'm thinking I will address that sometime in the future. Is that an issue if they come separately? I can add an example in the meantime

@ricardoV94
Copy link
Contributor

but I'm currently in rebase hell from the last PR

I would squash those 15 commits first

@wd60622 wd60622 mentioned this pull request Jan 4, 2024
@juanitorduz
Copy link
Collaborator

Thanks @wd60622 ! If its easier for you you can create another PR with a clean tree :)

Regarding the notebook: sure, let's make it another different PR. I will test it anyway during my review 💪

Comment on lines +563 to +573
X_pred = pd.DataFrame(
{
"date": new_dates,
"channel_1": rng.integers(low=0, high=400, size=n),
"channel_2": rng.integers(low=0, high=50, size=n),
"control_1": rng.gamma(shape=1000, scale=500, size=n),
"control_2": rng.gamma(shape=100, scale=5, size=n),
"other_column_1": rng.integers(low=0, high=100, size=n),
"other_column_2": rng.normal(loc=0, scale=1, size=n),
}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have this as a fixture and split this tests as it has many asserts checking different functions?

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 11, 2024

I will start a new branch based using the feedback.

@juanitorduz would you be able to share the mentioned method to bring the target back to the original scale for xarray DataArray. Is this what you had in mind?

@juanitorduz
Copy link
Collaborator

@juanitorduz
Copy link
Collaborator

I have an alternative way using an array method, see https://juanitorduz.github.io/flax_numpyro/ I have no preference 😄

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 11, 2024

I have an alternative way using an array method, see https://juanitorduz.github.io/flax_numpyro/ I have no preference 😄

Perfect! Thank you for sending that over.

I am closing this PR in favor of a cleaner branch.
Already add the original scale back in 😄 This was super helpful

New PR: #482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants