Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add datasetter for date and fourier #405

Closed
wants to merge 8 commits into from

Conversation

wd60622
Copy link
Contributor

@wd60622 wd60622 commented Oct 21, 2023

Addresses #400


📚 Documentation preview 📚: https://pymc-marketing--405.org.readthedocs.build/en/405/

@wd60622
Copy link
Contributor Author

wd60622 commented Oct 21, 2023

WIP. Will add some tests for the different scenarios that were described

@codecov
Copy link

codecov bot commented Oct 21, 2023

Codecov Report

Merging #405 (9383e76) into main (916dce4) will decrease coverage by 0.08%.
The diff coverage is 85.18%.

❗ Current head 9383e76 differs from pull request most recent head 65dccfa. Consider uploading reports for the commit 65dccfa to get more accurate results

@@            Coverage Diff             @@
##             main     #405      +/-   ##
==========================================
- Coverage   88.67%   88.59%   -0.08%     
==========================================
  Files          21       21              
  Lines        1880     1885       +5     
==========================================
+ Hits         1667     1670       +3     
- Misses        213      215       +2     
Files Coverage Δ
pymc_marketing/mmm/transformers.py 94.87% <ø> (ø)
pymc_marketing/mmm/utils.py 90.38% <100.00%> (+0.80%) ⬆️
pymc_marketing/mmm/delayed_saturated_mmm.py 98.14% <82.60%> (-1.86%) ⬇️

... and 1 file with indirect coverage changes

Copy link
Contributor Author

@wd60622 wd60622 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions about the intended behavior

Also, there is some heavy overlap between _generate_and_preprocess_model_data and this method now and I think much could be combined

Comment on lines 494 to 495
if isinstance(X, np.ndarray):
return X
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should support ndarray when we want to select different sets of columns. Unless we want to support the case where only costs are added. i.e
isinstance(X, np.ndarray) and self.control_columns is None and self.yearly_seasonality is None

I think that this can simplify the code heavily

try:
new_channel_data = X[self.channel_columns].to_numpy()
except KeyError as e:
raise RuntimeError("New data must contain channel_data!", e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFrames already raise KeyErrors can that be used instead? What's more important: switching to RuntimeError or having an informative error message

@@ -67,7 +67,7 @@ def batched_convolution(x, w, axis: int = 0):


def geometric_adstock(
x, alpha: float = 0.0, l_max: int = 12, normalize: bool = False, axis: int = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting a mypy issue

@wd60622
Copy link
Contributor Author

wd60622 commented Oct 22, 2023

Also think that this is still missing the preprocess call which seems to be missing just transform method call. I have some more details about my suggestion here #407

@wd60622 wd60622 marked this pull request as draft October 23, 2023 15:38
@wd60622
Copy link
Contributor Author

wd60622 commented Oct 26, 2023

Still working through the tests as removing the mixins requires a lot of changes 😢 However, the new interface should be somewhat stable.

I can keep the mixins and just use the transformers that are within to support new data if that is more desirable. I'd appreciate any feedback and requests @ricardoV94 @juanitorduz @cluhmann

Comment on lines +123 to +125
if self.validate_data:
self.validate("X", X)
self.validate("y", y)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New order because the x_transformer.transform(X) will not have date_col

@wd60622
Copy link
Contributor Author

wd60622 commented Nov 15, 2023

Will continue this off of the branch that @cetagostini is working on

@wd60622 wd60622 closed this Nov 15, 2023
@wd60622 wd60622 deleted the new-data-coords branch November 20, 2023 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant