Support New Data for MMM model #444

wd60622 · 2023-11-24T14:01:58Z

Barebone implementation to support new data instead of the PR that removes the mixins

Mixins transformations from child classes can still be used if they exist down the line.

This PR introduces :

mutable coords for the date in order to support new data
transformations of channel, control, and date features (if exists) for new data
raise error in data setter method if not pandas DataFrame since columns various columns need to be accessed
tests for the sample_posterior_predictive method and check for failure of predict_posterior method as design matrix has date type. To enable this, there would have to be relaxing of ModelBuilder validation or override the method

📚 Documentation preview 📚: https://pymc-marketing--444.org.readthedocs.build/en/444/

wd60622 · 2023-11-24T14:48:42Z

pymc_marketing/mmm/delayed_saturated_mmm.py

+        if not isinstance(X, pd.DataFrame):
+            raise TypeError(
+                "X must be a pandas DataFrame in order to access the columns"
+            )


I'm viewing this as required in order to access the columns. Changing the type hint makes mypy mad

If there are any other suggestions, let me know

I think 99% of the cases people using this package use pandas, so is ok for me.

wd60622 · 2023-11-24T15:15:55Z

Failing test for the change in behavior of _data_setter. No longer supports np.ndarray as input with different

EDIT: tests are updated to pass

codecov · 2023-11-24T18:03:40Z

Codecov Report

Attention: Patch coverage is 88.57143% with 4 lines in your changes missing coverage. Please review.

Project coverage is 90.67%. Comparing base (890c469) to head (ba62ce1).
Report is 253 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc_marketing/mmm/delayed_saturated_mmm.py	88.57%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #444      +/-   ##
==========================================
- Coverage   90.82%   90.67%   -0.15%     
==========================================
  Files          21       21              
  Lines        1972     1994      +22     
==========================================
+ Hits         1791     1808      +17     
- Misses        181      186       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wd60622 · 2023-11-24T18:24:28Z

pymc_marketing/mmm/delayed_saturated_mmm.py

+        except KeyError as e:
+            raise RuntimeError("New data must contain channel_data!", e)


Should this be the error raises by any missing keys of the dataframe? For instance, date_column or control_columns

I think so! I suggest that, for this iteration, we keep this strict and then wait for feedback.

wd60622 · 2023-12-02T09:17:53Z

Will rework the tests to include:

old dates only
old and new dates
new dates only

But also the missing Fourier feautre

I believe there will be an issue if the new data doesn't have more than max_adstock rows too

Edit: all these checks complete

wd60622 · 2023-12-03T16:25:11Z

tests/mmm/test_delayed_saturated_mmm.py

+            TypeError,
+            match=r"The DType <class 'numpy.dtype\[datetime64\]'> could not be promoted by",
+        ):
+            mmm.predict_posterior(X_pred=new_X)


as noted in the comment, the ModelBuilder method will need an override if this is to work with dates in the input DataFrame

cetagostini · 2023-12-07T21:52:13Z

Hey mate, do you have a code snippet about how should work? What data should the data frame have to predict? Are you planning to add the example to the example notebook? (If you need, I can give you a hand with it) 🙌🏻

wd60622 · 2023-12-09T08:15:20Z

Hey mate, do you have a code snippet about how should work? What data should the data frame to predict? Are you planning to add the example to the example notebook? (If you need, I can give you a hand with it) 🙌🏻

Good thinking. I will add some to the mmm_example.ipynb

wd60622 · 2023-12-20T18:50:06Z

Had to do a force push with the rebases 🙏

pymc_marketing/mmm/delayed_saturated_mmm.py

juanitorduz

@wd60622 This looks great! I left some comments regarding your questions! Just ping me when you would like a more detailed review.

This PR solves #450 right?

Also, will you add this extension to the MMM example notebook as part of this PR?

wd60622 · 2023-12-28T06:57:52Z

@wd60622 This looks great! I left some comments regarding your questions! Just ping me when you would like a more detailed review.

This PR solves #450 right?

Yes, It will perform all the transformation on new data including scaling. The caveat is that the predictions are not in their original scale.

Also, will you add this extension to the MMM example notebook as part of this PR?

My current goal is to get all the predict methods sorted out, but I'm currently in rebase hell from the last PR

If it clears up, I will add a notebook too. But I'm thinking I will address that sometime in the future. Is that an issue if they come separately? I can add an example in the meantime

ricardoV94 · 2023-12-28T13:07:07Z

but I'm currently in rebase hell from the last PR

I would squash those 15 commits first

juanitorduz · 2024-01-07T21:15:18Z

Thanks @wd60622 ! If its easier for you you can create another PR with a clean tree :)

Regarding the notebook: sure, let's make it another different PR. I will test it anyway during my review 💪

juanitorduz · 2024-01-10T13:27:00Z

tests/mmm/test_delayed_saturated_mmm.py

+        X_pred = pd.DataFrame(
+            {
+                "date": new_dates,
+                "channel_1": rng.integers(low=0, high=400, size=n),
+                "channel_2": rng.integers(low=0, high=50, size=n),
+                "control_1": rng.gamma(shape=1000, scale=500, size=n),
+                "control_2": rng.gamma(shape=100, scale=5, size=n),
+                "other_column_1": rng.integers(low=0, high=100, size=n),
+                "other_column_2": rng.normal(loc=0, scale=1, size=n),
+            }
+        )


We could have this as a fixture and split this tests as it has many asserts checking different functions?

wd60622 · 2024-01-11T16:15:19Z

I will start a new branch based using the feedback.

@juanitorduz would you be able to share the mentioned method to bring the target back to the original scale for xarray DataArray. Is this what you had in mind?

juanitorduz · 2024-01-11T16:19:33Z

This is the method https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/mmm/base.py#L478

juanitorduz · 2024-01-11T16:20:16Z

I have an alternative way using an array method, see https://juanitorduz.github.io/flax_numpyro/ I have no preference 😄

wd60622 · 2024-01-11T19:26:26Z

I have an alternative way using an array method, see https://juanitorduz.github.io/flax_numpyro/ I have no preference 😄

Perfect! Thank you for sending that over.

I am closing this PR in favor of a cleaner branch.
Already add the original scale back in 😄 This was super helpful

New PR: #482

wd60622 added 2 commits November 24, 2023 14:57

support for mixins

27f8ce2

test predict method

72af239

wd60622 commented Nov 24, 2023

View reviewed changes

wd60622 added 3 commits November 24, 2023 16:55

fix tests for data setter

6af2fda

property tests

1fbb966

carry along dtype from fit

b3484fc

reduce level of indent

3b292e3

wd60622 commented Nov 24, 2023

View reviewed changes

wd60622 mentioned this pull request Dec 2, 2023

sample_posterior_predictive doesn't scale X data #450

Closed

test different dates and different models

4626348

wd60622 commented Dec 3, 2023

View reviewed changes

remove the unused fixture

2cd0b31

wd60622 added 2 commits December 20, 2023 18:26

consolidate tests

f99600b

support for predict_posterior method with adstock effects

c49ab8e

wd60622 force-pushed the support-new-data branch from b1f4b90 to c49ab8e Compare December 20, 2023 18:47

wd60622 commented Dec 20, 2023

View reviewed changes

pymc_marketing/mmm/delayed_saturated_mmm.py Show resolved Hide resolved

juanitorduz reviewed Dec 20, 2023

View reviewed changes

pymc_marketing/mmm/delayed_saturated_mmm.py Show resolved Hide resolved

juanitorduz reviewed Dec 20, 2023

View reviewed changes

wd60622 added 5 commits December 28, 2023 07:19

resolve conflict

f85a375

resolve another conflict

f8f2a8e

test different dates and different models

91a15c6

rename variable

f41a1f2

latest method

ba62ce1

wd60622 mentioned this pull request Jan 4, 2024

scaling control vars #472

Closed

juanitorduz reviewed Jan 10, 2024

View reviewed changes

wd60622 closed this Jan 11, 2024

wd60622 deleted the support-new-data branch January 11, 2024 19:34

wd60622 mentioned this pull request Jan 12, 2024

Handle new data correctly and extend functionality of MMM posterior predictive methods #482

Merged

13 tasks

wd60622 added the MMM label Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support New Data for MMM model #444

Support New Data for MMM model #444

wd60622 commented Nov 24, 2023 •

edited

Loading

wd60622 Nov 24, 2023

juanitorduz Dec 20, 2023

wd60622 commented Nov 24, 2023 •

edited

Loading

codecov bot commented Nov 24, 2023 •

edited

Loading

wd60622 Nov 24, 2023

juanitorduz Dec 20, 2023

wd60622 commented Dec 2, 2023 •

edited

Loading

wd60622 Dec 3, 2023

cetagostini commented Dec 7, 2023 •

edited

Loading

wd60622 commented Dec 9, 2023

wd60622 commented Dec 20, 2023

juanitorduz left a comment

wd60622 commented Dec 28, 2023 •

edited

Loading

ricardoV94 commented Dec 28, 2023

juanitorduz commented Jan 7, 2024

juanitorduz Jan 10, 2024

wd60622 commented Jan 11, 2024

juanitorduz commented Jan 11, 2024

juanitorduz commented Jan 11, 2024

wd60622 commented Jan 11, 2024 •

edited

Loading

		except KeyError as e:
		raise RuntimeError("New data must contain channel_data!", e)

Support New Data for MMM model #444

Support New Data for MMM model #444

Conversation

wd60622 commented Nov 24, 2023 • edited Loading

wd60622 Nov 24, 2023

Choose a reason for hiding this comment

juanitorduz Dec 20, 2023

Choose a reason for hiding this comment

wd60622 commented Nov 24, 2023 • edited Loading

codecov bot commented Nov 24, 2023 • edited Loading

Codecov Report

wd60622 Nov 24, 2023

Choose a reason for hiding this comment

juanitorduz Dec 20, 2023

Choose a reason for hiding this comment

wd60622 commented Dec 2, 2023 • edited Loading

wd60622 Dec 3, 2023

Choose a reason for hiding this comment

cetagostini commented Dec 7, 2023 • edited Loading

wd60622 commented Dec 9, 2023

wd60622 commented Dec 20, 2023

juanitorduz left a comment

Choose a reason for hiding this comment

wd60622 commented Dec 28, 2023 • edited Loading

ricardoV94 commented Dec 28, 2023

juanitorduz commented Jan 7, 2024

juanitorduz Jan 10, 2024

Choose a reason for hiding this comment

wd60622 commented Jan 11, 2024

juanitorduz commented Jan 11, 2024

juanitorduz commented Jan 11, 2024

wd60622 commented Jan 11, 2024 • edited Loading

wd60622 commented Nov 24, 2023 •

edited

Loading

wd60622 commented Nov 24, 2023 •

edited

Loading

codecov bot commented Nov 24, 2023 •

edited

Loading

wd60622 commented Dec 2, 2023 •

edited

Loading

cetagostini commented Dec 7, 2023 •

edited

Loading

wd60622 commented Dec 28, 2023 •

edited

Loading

wd60622 commented Jan 11, 2024 •

edited

Loading