-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Budget allocation and optimal point estimations #329
Budget allocation and optimal point estimations #329
Conversation
pymc_marketing/mmm/utils.py
Outdated
@@ -33,3 +36,43 @@ def generate_fourier_modes( | |||
for func in ("sin", "cos") | |||
} | |||
) | |||
|
|||
|
|||
def michaelis_menten(x, L, k) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also useful as a dedicated saturation function in mmm.transformers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, checking Google MMM lightweight they had included several model configurations (Hill, Carryover, Adstock) which you could choose to define Saturation and Lagging.
Example:
They refer to the following wiki where the saturation mentioned it is very similar to the Michaelis Menten
. I could imagine we can implement something similar here, in order to be more extensive to how different existing data sets can fit better to different saturation and lagging functions depending on their own conditions.
pymc_marketing/mmm/base.py
Outdated
fig_estimations, ax_estimations = plt.subplots(figsize=(8, 6)) | ||
|
||
L, k = estimate_menten_parameters(channel, self.X, channel_contributions) | ||
plateau_x = k * (0.99 * L / (L * 0.01)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use these 0.99 and 0.01 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a practical solution because the Michaelis-Menten equation is given by: y = (k + x) / (L * x)
where k
represents the substrate concentration at which the y
is half of L
. In other words, when x = k
and y = L/2
. As x
increases the equation approaches: y≈L
.
As obvious because y
becomes saturated; adding more x
doesn't significantly increase y
. The value L
is an asymptote for the function, meaning the curve approaches L
but never quite reaches it.
Using 0.99 and 0.01 calculates the x
point when y
is 99% of L
. Simplified, it results in 99k
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Do you want to add a shorter summary to the doc strings? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was very nice and easy o review thanks! I left some small comments about code style.
I will continue testing the budget allocator in the mmm example notebook :)
pymc_marketing/mmm/base.py
Outdated
parameters: Optional[Dict[str, Tuple[float, float]]], | ||
budget_bounds: Optional[Dict[str, Tuple[float, float]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have Optional
type it mean that they can be None
? Otherwise is not an Optional
type. See https://mypy.readthedocs.io/en/stable/kinds_of_types.html
Optional[...] does not mean a function argument with a default value. It simply means that None is a valid value for the argument. This is a common confusion because None is a common default value for arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally correct, in this case, parameters should not be optional, only budget bounds. Change budget bounds to be optional and modify parameters based on this.
def budget_allocator(
total_budget: int = 1000,
channels: Union[List[str], Tuple[str, ...]] = [],
parameters: Dict[str, Tuple[float, float]] = {},
budget_ranges: Optional[Dict[str, Tuple[float, float]]] = None,
) -> DataFrame:
return contributions | ||
|
||
|
||
def objective_distribution(x, channels, parameters): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add type hints
@cetagostini Would you mind sharing an example code of how this will be used in our example? You can assume the I want to run: mmm.budget_allocation(
total_budget=1000,
) and I get the (expected error) TypeError Traceback (most recent call last)
Cell In[43], line 1
----> 1 mmm.budget_allocation(
2 total_budget=1000,
3 )
TypeError: BaseMMM.budget_allocation() missing 2 required positional arguments: 'parameters' and 'budget_bounds' So a reproducible code snipped will help the review :) |
@juanitorduz Again, thank you very much for your review! Before sharing, I did a small verification and found a couple of issues to be corrected. It took me a bit longer, but here is the snippet and a small Colab to check the results, so you can play with it. I'll be working during this week around all your mentions, I believe It should be ready by Monday! 🚀 Code snippet#Estimate your curve (Michaelis Menten)
parameters = mmm.compute_channel_estimate_points_original_scale()
#Budget allocation based on the estimations
mmm.optimize_channel_budget_for_maximum_contribution(
total_budget=5,
parameters = parameters,
budget_bounds = {'x1':[1,30],
'x2':[1,60]
}
) #Check the curve
mmm.plot_direct_contribution_curves(show_estimations=True) Example notebook |
@juanitorduz All the changes requested are already applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice and works on the example notebook. I left minor comments regarding naming and type hints. After that I think we can merge this an add an expermiental warning to get feedback from users 💪
pymc_marketing/mmm/base.py
Outdated
budget_ranges=budget_bounds, | ||
) | ||
|
||
def compute_channel_estimate_points_original_scale(self) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this method as compute_channel_plateat_points_original_scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switch to compute_channel_curve_parameters_original_scale
given the fact we can use a differents curve fit now. What do you think?
pymc_marketing/mmm/base.py
Outdated
} | ||
|
||
def plot_direct_contribution_curves( | ||
self, show_estimations: bool = False, x_stop=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can re rename the parameter show_estimations
to show_michaelis_menten_fit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename x_stop
to xlim_max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"estimated_contribution": calculate_expected_contribution( | ||
parameters, optimal_budget | ||
), | ||
"optimal_budget": optimal_budget, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cetagostini I also suggest opening a followup documentation PR where we load the mmm model from the example (we can use the model builder here) to run and explain this optimization procedure. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report
@@ Coverage Diff @@
## main #329 +/- ##
==========================================
- Coverage 94.52% 88.71% -5.82%
==========================================
Files 20 21 +1
Lines 1663 1869 +206
==========================================
+ Hits 1572 1658 +86
- Misses 91 211 +120
|
…on_branch Updating branch
Tests and lint are 🟢 ! Yay! I will take a look in the upcoming days :) |
Can we merge? |
I think we are set. Not sure if the team has other questions @twiecki From my side:
I added some flexibility to the The The error given now by Wait for your feedback! |
Can we add a warning saying that this feature is experimental? What do you think ? |
Sure, where do you think can we added? Notebook? @juanitorduz |
Sorry for the late reply 🙈 ! What about the public methods regarding optimization in |
I added an alert to the docstrings located under |
Thank you @cetagostini ! In addition, we can add a warning in the code itself as: import warnings
...
warnings.warn("This budget allocator method is experimental", UserWarning) After that we can merge from my side :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cetagostini Thank for for this contribution! IMO let us ship this as experimental and get feedback from users and iterate!
WDYT @ricardoV94 ? @twiecki ?
Yes, we should merge this. |
Incredible contribution @cetagostini! |
Hey guys! Didn't have the time to reply properly a few days ago, just wan to highlight a few things.
ps: I'll probably be doing a little self-promotion this week to invite users to test. Vamos PyMC-Marketing! 🚀 |
Hello team!
Context
As marketers, we're perpetually hunting for ways to amp up the effectiveness of the campaigns we churn out regularly.
Hence, possessing reliable methods to allocate our budget efficiently—wherein we maximize our outcomes—is paramount in ensuring the success of our marketing investments.
Competitors
Other tools like Robyn and Lightweight MMM offer features to create budget allocation aimed at maximizing the target. This highlights the importance it holds within the community.
Goal
Integrating a budget allocator function within PyMC-Marketing that allows users to conveniently select a budget and allocate their resources to those channels that maximize the value of their target variable.
Secondary goals (Value proposition)
Hypotesis
As we are dealing with diminishing return curves, which have a sigmoidal shape, meaning, as
x
becomes excessively positive (approaching infinity), the sigmoid function approaches its upper limit. Based on my interpretation of thelogistic_saturation
function onmmm/transformers.py
.We should find different sections on the curve:
These two sections of the curve would be divided by the
elbow
, basically where our curve changes direction and starts to be flatted.Solution
If we know the function that defines the curve of our data, we can project it to discern which channels have the most potential for development according to our model and which do not. This empowers the budget allocator to distribute our resources non-linearly, based on the data.
Example output:
Work.
To achieve our goal, the first step was to find a function that could interpret the points in a coherent manner. Using the original function was challenging due to its scaled nature, and it operates with values and parameters between 0 and 1, which are difficult to retrieve after the model is trained.
I chose the function of Michael Mentes, which is expressed as follows:
y = L * x / K + x
where:
How look like?
k=0.5
, the curve saturates quickly and then plateaus. This is because the "elbow" is closer to the y-axis.ps: This given the same value for
L
How do these changes affect the current curve fitting?
Since we now have a quadratic function, at times the model tends to believe that after passing the maximum point, any additional input will yield negative returns. This is indeed true when discussing marginal returns, but in this case, our curve is dealing with absolute values.
Example 1: Current fit
Example 2: Modify fit (Michael Mentes)
Great to hear your opinions here: @ricardoV94 @juanitorduz
Questions
Help
Important
This PR is a synthesis of the one previously opened here. The idea is to take all the feedback received earlier to create this cleaner and more succinct draft for new contributors. I hope you find it appropriate. I'm also open to suggestions.
I still need to run this on some notebooks to ensure that the results are coherent. Although everything has worked as custom functions in my test notebook, I must try installing this version of the PR.
Code snippet
Example notebook
Google Colab: #329 PR
cc: @juanitorduz @ricardoV94 @twiecki @cluhmann
📚 Documentation preview 📚: https://pymc-marketing--329.org.readthedocs.build/en/329/