Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Budget allocation and optimal point estimations #329

Merged
merged 67 commits into from
Oct 2, 2023

Conversation

cetagostini
Copy link
Contributor

@cetagostini cetagostini commented Jul 25, 2023

Hello team!

Context

As marketers, we're perpetually hunting for ways to amp up the effectiveness of the campaigns we churn out regularly.

Hence, possessing reliable methods to allocate our budget efficiently—wherein we maximize our outcomes—is paramount in ensuring the success of our marketing investments.

Competitors

Other tools like Robyn and Lightweight MMM offer features to create budget allocation aimed at maximizing the target. This highlights the importance it holds within the community.

Goal

Integrating a budget allocator function within PyMC-Marketing that allows users to conveniently select a budget and allocate their resources to those channels that maximize the value of their target variable.

Secondary goals (Value proposition)

  1. Including optional variables that allow the use of business rules within the budget optimizer, so users can tinker more freely and optimize the variable based on their own conceptions (Priors).
  2. Incorporating additional information that allows users to better understand where they should not be overspending, and where their spend start to diminish.
  3. Add uncertainty around the curves.

Hypotesis

As we are dealing with diminishing return curves, which have a sigmoidal shape, meaning, as x becomes excessively positive (approaching infinity), the sigmoid function approaches its upper limit. Based on my interpretation of the logistic_saturation function on mmm/transformers.py.

We should find different sections on the curve:

  1. One where our curve increase and gets more results, even if those results are just a few on absolute numbers.
  2. Other where our curve increases very slowly and gets a lot of results on absolute numbers, but at a higher cost.

These two sections of the curve would be divided by the elbow, basically where our curve changes direction and starts to be flatted.

image

Solution

If we know the function that defines the curve of our data, we can project it to discern which channels have the most potential for development according to our model and which do not. This empowers the budget allocator to distribute our resources non-linearly, based on the data.

Example output:
Screenshot 2023-07-25 at 23 15 05

Screenshot 2023-07-25 at 23 09 14

Work.

To achieve our goal, the first step was to find a function that could interpret the points in a coherent manner. Using the original function was challenging due to its scaled nature, and it operates with values and parameters between 0 and 1, which are difficult to retrieve after the model is trained.

I chose the function of Michael Mentes, which is expressed as follows:
y = L * x / K + x

where:

  • L is the maximum value the curve approaches as x goes to infinity (akin to the saturation level).
  • k is the x value when the function reaches half its maximum value. This can be thought of as a measure of the steepness of the curve or the position of the "elbow".

How look like?

  • When k=0.5, the curve saturates quickly and then plateaus. This is because the "elbow" is closer to the y-axis.
  • As k increases, the curve starts to look more like an linear curve, with the elbow moving closer to the x-axis.
    ps: This given the same value for L

image

How do these changes affect the current curve fitting?

Since we now have a quadratic function, at times the model tends to believe that after passing the maximum point, any additional input will yield negative returns. This is indeed true when discussing marginal returns, but in this case, our curve is dealing with absolute values.

Example 1: Current fit
Screenshot 2023-07-25 at 23 35 19

Example 2: Modify fit (Michael Mentes)
Screenshot 2023-07-25 at 23 36 02

Great to hear your opinions here: @ricardoV94 @juanitorduz

Questions

  • First of all, I would like to get your feedback on the reasoning and logic behind, it and if makes sense to you to move in this direction with PyMC-Marketing. Based on my experience, makes a lot of sense but I could be biased.
  • Even when we use our model to fit the data into a logistic (sigmoid) function, within the response curve plots, we make use of a quadratic (polynomial) fit. Given by default on Seaborn. Why?

Help

  • Unit tests, I already build some but I'm not so familiarized. It would be amazing if you could check the functions built and validate are working as expected.

Important

This PR is a synthesis of the one previously opened here. The idea is to take all the feedback received earlier to create this cleaner and more succinct draft for new contributors. I hope you find it appropriate. I'm also open to suggestions.

I still need to run this on some notebooks to ensure that the results are coherent. Although everything has worked as custom functions in my test notebook, I must try installing this version of the PR.

Code snippet

#Estimate your curve (Michaelis Menten)
parameters = mmm.compute_channel_estimate_points_original_scale()

#Budget allocation based on the estimations
mmm.budget_allocation(
    total_budget=5,
    parameters = parameters,
    budget_bounds = {'x1':[1,30],
                     'x2':[1,60]
                     }
)
#Check the curve
mmm.plot_direct_contribution_curves(show_estimations=True)

Example notebook

Google Colab: #329 PR

cc: @juanitorduz @ricardoV94 @twiecki @cluhmann


📚 Documentation preview 📚: https://pymc-marketing--329.org.readthedocs.build/en/329/

@@ -33,3 +36,43 @@ def generate_fourier_modes(
for func in ("sin", "cos")
}
)


def michaelis_menten(x, L, k) -> float:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also useful as a dedicated saturation function in mmm.transformers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, checking Google MMM lightweight they had included several model configurations (Hill, Carryover, Adstock) which you could choose to define Saturation and Lagging.
Example:
Screenshot 2023-08-01 at 14 49 09

They refer to the following wiki where the saturation mentioned it is very similar to the Michaelis Menten. I could imagine we can implement something similar here, in order to be more extensive to how different existing data sets can fit better to different saturation and lagging functions depending on their own conditions.

Screenshot 2023-08-01 at 14 46 00

fig_estimations, ax_estimations = plt.subplots(figsize=(8, 6))

L, k = estimate_menten_parameters(channel, self.X, channel_contributions)
plateau_x = k * (0.99 * L / (L * 0.01))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use these 0.99 and 0.01 ?

Copy link
Contributor Author

@cetagostini cetagostini Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a practical solution because the Michaelis-Menten equation is given by: y = (k + x) / (L * x) where k represents the substrate concentration at which the y is half of L. In other words, when x = k and y = L/2. As x increases the equation approaches: y≈L.

As obvious because y becomes saturated; adding more x doesn't significantly increase y. The value L is an asymptote for the function, meaning the curve approaches L but never quite reaches it.

Using 0.99 and 0.01 calculates the x point when y is 99% of L. Simplified, it results in 99k.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do you want to add a shorter summary to the doc strings? :)

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was very nice and easy o review thanks! I left some small comments about code style.

I will continue testing the budget allocator in the mmm example notebook :)

pymc_marketing/mmm/base.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/base.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/base.py Outdated Show resolved Hide resolved
Comment on lines 510 to 511
parameters: Optional[Dict[str, Tuple[float, float]]],
budget_bounds: Optional[Dict[str, Tuple[float, float]]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have Optional type it mean that they can be None ? Otherwise is not an Optional type. See https://mypy.readthedocs.io/en/stable/kinds_of_types.html

Optional[...] does not mean a function argument with a default value. It simply means that None is a valid value for the argument. This is a common confusion because None is a common default value for arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally correct, in this case, parameters should not be optional, only budget bounds. Change budget bounds to be optional and modify parameters based on this.

def budget_allocator(
    total_budget: int = 1000,
    channels: Union[List[str], Tuple[str, ...]] = [],
    parameters: Dict[str, Tuple[float, float]] = {},
    budget_ranges: Optional[Dict[str, Tuple[float, float]]] = None,
) -> DataFrame:

pymc_marketing/mmm/base.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/budget_optimizer.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/budget_optimizer.py Outdated Show resolved Hide resolved
return contributions


def objective_distribution(x, channels, parameters):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add type hints

pymc_marketing/mmm/budget_optimizer.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/budget_optimizer.py Outdated Show resolved Hide resolved
@juanitorduz
Copy link
Collaborator

juanitorduz commented Aug 9, 2023

@cetagostini Would you mind sharing an example code of how this will be used in our example? You can assume the mmm object is already fitted. Thank would help the review.

I want to run:

mmm.budget_allocation(
    total_budget=1000,
)

and I get the (expected error)

TypeError                                 Traceback (most recent call last)
Cell In[43], line 1
----> 1 mmm.budget_allocation(
      2     total_budget=1000,
      3 )

TypeError: BaseMMM.budget_allocation() missing 2 required positional arguments: 'parameters' and 'budget_bounds'

So a reproducible code snipped will help the review :)

@cetagostini
Copy link
Contributor Author

cetagostini commented Aug 10, 2023

@juanitorduz Again, thank you very much for your review!

Before sharing, I did a small verification and found a couple of issues to be corrected. It took me a bit longer, but here is the snippet and a small Colab to check the results, so you can play with it.

I'll be working during this week around all your mentions, I believe It should be ready by Monday! 🚀

Code snippet

#Estimate your curve (Michaelis Menten)
parameters = mmm.compute_channel_estimate_points_original_scale()

#Budget allocation based on the estimations
mmm.optimize_channel_budget_for_maximum_contribution(
    total_budget=5,
    parameters = parameters,
    budget_bounds = {'x1':[1,30],
                     'x2':[1,60]
                     }
)
#Check the curve
mmm.plot_direct_contribution_curves(show_estimations=True)

Example notebook

Google Colab: #329 PR

@cetagostini
Copy link
Contributor Author

cetagostini commented Aug 10, 2023

@juanitorduz All the changes requested are already applied.

tests/mmm/test_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice and works on the example notebook. I left minor comments regarding naming and type hints. After that I think we can merge this an add an expermiental warning to get feedback from users 💪

pymc_marketing/mmm/base.py Show resolved Hide resolved
pymc_marketing/mmm/base.py Outdated Show resolved Hide resolved
pymc_marketing/mmm/budget_optimizer.py Show resolved Hide resolved
budget_ranges=budget_bounds,
)

def compute_channel_estimate_points_original_scale(self) -> Dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this method as compute_channel_plateat_points_original_scale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switch to compute_channel_curve_parameters_original_scale given the fact we can use a differents curve fit now. What do you think?

}

def plot_direct_contribution_curves(
self, show_estimations: bool = False, x_stop=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can re rename the parameter show_estimations to show_michaelis_menten_fit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default we should keep the original range. In the example notebook when I run

fig = mmm.plot_direct_contribution_curves(show_estimations=True)
[ax.set(xlabel="x") for ax in fig.axes]

I get:

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename x_stop to xlim_max

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, using x_stop = 1.5 I get
image

"estimated_contribution": calculate_expected_contribution(
parameters, optimal_budget
),
"optimal_budget": optimal_budget,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the example notebook I get
image
Should we add the expected total value instead of NaN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, already working as expected.

Screenshot 2023-08-29 at 00 40 31

@juanitorduz
Copy link
Collaborator

juanitorduz commented Aug 16, 2023

@cetagostini I also suggest opening a followup documentation PR where we load the mmm model from the example (we can use the model builder here) to run and explain this optimization procedure.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Sep 11, 2023

Codecov Report

Merging #329 (aeb7faf) into main (3a9db51) will decrease coverage by 5.82%.
Report is 1 commits behind head on main.
The diff coverage is 43.12%.

@@            Coverage Diff             @@
##             main     #329      +/-   ##
==========================================
- Coverage   94.52%   88.71%   -5.82%     
==========================================
  Files          20       21       +1     
  Lines        1663     1869     +206     
==========================================
+ Hits         1572     1658      +86     
- Misses         91      211     +120     
Files Coverage Δ
pymc_marketing/mmm/utils.py 89.58% <87.80%> (-10.42%) ⬇️
pymc_marketing/mmm/budget_optimizer.py 84.44% <84.44%> (ø)
pymc_marketing/mmm/base.py 64.79% <13.60%> (-32.72%) ⬇️

@juanitorduz
Copy link
Collaborator

juanitorduz commented Sep 14, 2023

Tests and lint are 🟢 ! Yay! I will take a look in the upcoming days :)

@twiecki
Copy link
Contributor

twiecki commented Sep 14, 2023

Can we merge?

@cetagostini
Copy link
Contributor Author

cetagostini commented Sep 19, 2023

Can we merge?

I think we are set. Not sure if the team has other questions @twiecki

From my side:

  1. We have already all checks except codecov. Still, I added a few new unit tests to improve and increase the target.
  2. The branch is updated up to September 19th.
  3. I ran a test on the current notebook using my branch and everything works properly (Test here).
  4. I ran a test on the new example notebook for budget allocation and it works as well (Test here)

I added some flexibility to the curve_fit functions to give users the possibility to use them outside of the mmm class. This also gives the chance to handle non-fit errors given by the optimizer. These new options are added to the budget allocation notebook example here

The pre-commit looks correct when I test it locally:
Screenshot 2023-09-19 at 20 16 15

The error given now by docs/readthedocs.org:pymc-marketing is related to the Numpy new version, but I assume this should be solved from the GitHub config checks. Right?

Wait for your feedback!

cc: @ricardoV94 @juanitorduz

@juanitorduz
Copy link
Collaborator

Can we add a warning saying that this feature is experimental? What do you think ?

@cetagostini
Copy link
Contributor Author

Can we add a warning saying that this feature is experimental? What do you think ?

Sure, where do you think can we added? Notebook? @juanitorduz

@juanitorduz
Copy link
Collaborator

Can we add a warning saying that this feature is experimental? What do you think ?

Sure, where do you think can we added? Notebook? @juanitorduz

Sorry for the late reply 🙈 ! What about the public methods regarding optimization in pymc_marketing/mmm/base.py?

@cetagostini
Copy link
Contributor Author

cetagostini commented Sep 30, 2023

I added an alert to the docstrings located under base.py. For each new function, It is okay? @juanitorduz

Screenshot 2023-09-30 at 14 23 04

@juanitorduz
Copy link
Collaborator

I added an alert to the docstrings located under base.py. For each new function, It is okay? @juanitorduz

Screenshot 2023-09-30 at 14 23 04

Thank you @cetagostini ! In addition, we can add a warning in the code itself as:

import warnings

...

warnings.warn("This budget allocator method is experimental", UserWarning)

After that we can merge from my side :)

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cetagostini Thank for for this contribution! IMO let us ship this as experimental and get feedback from users and iterate!

WDYT @ricardoV94 ? @twiecki ?

@twiecki
Copy link
Contributor

twiecki commented Oct 2, 2023

Yes, we should merge this.

@twiecki twiecki merged commit 8e4fe3e into pymc-labs:main Oct 2, 2023
11 of 12 checks passed
@twiecki
Copy link
Contributor

twiecki commented Oct 2, 2023

Incredible contribution @cetagostini!

@cetagostini
Copy link
Contributor Author

cetagostini commented Oct 5, 2023

Hey guys! Didn't have the time to reply properly a few days ago, just wan to highlight a few things.

  1. Really thank you for the support, I'm thrilled to see the merge finally happening 🙌🏻 I'm quite inspired by this momentum, and I'll surely try to help with the other issues that are now active, since this one is closed.
  2. A great mention to @juanitorduz without all the time he dedicated and helped me get this PR forward, I would surely have stayed halfway. A great source of inspiration in the community, as well a great person to debate and share ideas reliably in the community.

ps: I'll probably be doing a little self-promotion this week to invite users to test.

Vamos PyMC-Marketing! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants