-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support New Data for MMM model #444
Conversation
if not isinstance(X, pd.DataFrame): | ||
raise TypeError( | ||
"X must be a pandas DataFrame in order to access the columns" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm viewing this as required in order to access the columns. Changing the type hint makes mypy mad
If there are any other suggestions, let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 99% of the cases people using this package use pandas, so is ok for me.
Failing test for the change in behavior of _data_setter. No longer supports np.ndarray as input with different EDIT: tests are updated to pass |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #444 +/- ##
==========================================
- Coverage 90.82% 90.67% -0.15%
==========================================
Files 21 21
Lines 1972 1994 +22
==========================================
+ Hits 1791 1808 +17
- Misses 181 186 +5 ☔ View full report in Codecov by Sentry. |
except KeyError as e: | ||
raise RuntimeError("New data must contain channel_data!", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the error raises by any missing keys of the dataframe? For instance, date_column or control_columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so! I suggest that, for this iteration, we keep this strict and then wait for feedback.
Will rework the tests to include:
But also the missing Fourier feautre I believe there will be an issue if the new data doesn't have more than Edit: all these checks complete |
TypeError, | ||
match=r"The DType <class 'numpy.dtype\[datetime64\]'> could not be promoted by", | ||
): | ||
mmm.predict_posterior(X_pred=new_X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as noted in the comment, the ModelBuilder method will need an override if this is to work with dates in the input DataFrame
Hey mate, do you have a code snippet about how should work? What data should the data frame have to predict? Are you planning to add the example to the example notebook? (If you need, I can give you a hand with it) 🙌🏻 |
Good thinking. I will add some to the mmm_example.ipynb |
b1f4b90
to
c49ab8e
Compare
Had to do a force push with the rebases 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It will perform all the transformation on new data including scaling. The caveat is that the predictions are not in their original scale.
My current goal is to get all the predict methods sorted out, but I'm currently in rebase hell from the last PR If it clears up, I will add a notebook too. But I'm thinking I will address that sometime in the future. Is that an issue if they come separately? I can add an example in the meantime |
I would squash those 15 commits first |
Thanks @wd60622 ! If its easier for you you can create another PR with a clean tree :) Regarding the notebook: sure, let's make it another different PR. I will test it anyway during my review 💪 |
X_pred = pd.DataFrame( | ||
{ | ||
"date": new_dates, | ||
"channel_1": rng.integers(low=0, high=400, size=n), | ||
"channel_2": rng.integers(low=0, high=50, size=n), | ||
"control_1": rng.gamma(shape=1000, scale=500, size=n), | ||
"control_2": rng.gamma(shape=100, scale=5, size=n), | ||
"other_column_1": rng.integers(low=0, high=100, size=n), | ||
"other_column_2": rng.normal(loc=0, scale=1, size=n), | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have this as a fixture and split this tests as it has many assert
s checking different functions?
I will start a new branch based using the feedback. @juanitorduz would you be able to share the mentioned method to bring the target back to the original scale for xarray DataArray. Is this what you had in mind? |
I have an alternative way using an array method, see https://juanitorduz.github.io/flax_numpyro/ I have no preference 😄 |
Perfect! Thank you for sending that over. I am closing this PR in favor of a cleaner branch. New PR: #482 |
Barebone implementation to support new data instead of the PR that removes the mixins
Mixins transformations from child classes can still be used if they exist down the line.
This PR introduces :
sample_posterior_predictive
method and check for failure ofpredict_posterior
method as design matrix has date type. To enable this, there would have to be relaxing of ModelBuilder validation or override the method📚 Documentation preview 📚: https://pymc-marketing--444.org.readthedocs.build/en/444/