Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Budget allocation and optimal point estimations #329

Merged
merged 67 commits into from
Oct 2, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
b8555cd
Creating utils & Budget optimizer function.
cetagostini Jul 25, 2023
8cd82ce
Modifying plot
cetagostini Jul 25, 2023
2bbd10f
Adding docstring on the function
cetagostini Jul 25, 2023
f449f6d
Modifying unit test
cetagostini Jul 25, 2023
01c1abc
modifying quantile method
cetagostini Aug 9, 2023
17d9da6
Updating
cetagostini Aug 9, 2023
f0ad14a
Update output docstring
cetagostini Aug 9, 2023
56b9a37
Parameter order error
cetagostini Aug 9, 2023
7a595e8
Format code using black
cetagostini Aug 9, 2023
a2f31d4
Modifying plot curves
cetagostini Aug 9, 2023
2e008ea
Applying Juan corrections
cetagostini Aug 10, 2023
265ab69
Adding new model saturation function
cetagostini Aug 26, 2023
abcde2d
Merge remote-tracking branch 'upstream/main' into dev_budget_allocati…
cetagostini Aug 26, 2023
d9ca6f5
Checking methods, modify names
cetagostini Aug 26, 2023
036bc1a
Changes
cetagostini Aug 26, 2023
c492005
Adjusting error
cetagostini Aug 26, 2023
98d0418
Adding docstring to functions
cetagostini Aug 26, 2023
7d66ac9
changing parameters
cetagostini Aug 26, 2023
2076ff3
Debugging error
cetagostini Aug 26, 2023
03fa95f
Returning values
cetagostini Aug 26, 2023
3ae0a67
Debugging lines out
cetagostini Aug 26, 2023
427dd19
Budget allocation scenarios function
cetagostini Aug 27, 2023
281c379
Solving error in function budget allocation
cetagostini Aug 27, 2023
b9f6fc9
Modifying alphas
cetagostini Aug 27, 2023
6f8294c
Modifying plot order
cetagostini Aug 27, 2023
7a5137d
Small changes on plot details
cetagostini Aug 27, 2023
a5ab632
Small changes in plot
cetagostini Aug 28, 2023
475504b
Modify Legend
cetagostini Aug 28, 2023
5cab673
Creating Budget Allocation example
cetagostini Aug 28, 2023
9f99c46
Small change on the notebook
cetagostini Aug 28, 2023
e85323c
Switching words
cetagostini Aug 28, 2023
e532ac9
Adding missing code
cetagostini Aug 28, 2023
d3fce02
Making budget bounds optional param
cetagostini Aug 28, 2023
e240817
Modifying notebook
cetagostini Aug 28, 2023
bf0c73d
Changing introduction
cetagostini Aug 28, 2023
a019e37
Adding links
cetagostini Aug 28, 2023
af344eb
Correcting grammar v1
cetagostini Aug 28, 2023
e23ebd7
changing title
cetagostini Aug 28, 2023
dba0f24
Checking unit test
cetagostini Aug 28, 2023
cd9a40c
Modifying narrative v2
cetagostini Aug 28, 2023
b3fa3b3
Correcting error on unit test
cetagostini Aug 29, 2023
c48691e
Modify Narrative V3
cetagostini Aug 29, 2023
a37cbde
Changing Load Model Section
cetagostini Aug 29, 2023
f5fa1eb
Small grammar correction
cetagostini Aug 29, 2023
05d6bb5
Modifying Unit Test & Docstrings
cetagostini Sep 5, 2023
7305590
Deleting Warning on Plot
cetagostini Sep 5, 2023
84e9f73
Small changes modifications
cetagostini Sep 6, 2023
9e34c31
Adding type hints
cetagostini Sep 6, 2023
7da6986
Adding type hints & docstrings
cetagostini Sep 6, 2023
db5bfcd
Solving Unit Test Errors
cetagostini Sep 6, 2023
acb343e
Adding last unit test
cetagostini Sep 8, 2023
06adcdd
Changing small
cetagostini Sep 9, 2023
df5d883
Updating functions and descriptions
cetagostini Sep 9, 2023
1ebeddb
update branch
cetagostini Sep 11, 2023
a572a3a
Reverting update
cetagostini Sep 11, 2023
5d2832f
Merge remote-tracking branch 'upstream/main' into dev_budget_allocati…
cetagostini Sep 11, 2023
38aae8e
Merge remote-tracking branch 'upstream/main' into dev_budget_allocati…
cetagostini Sep 12, 2023
bd4cb15
new unit test to increase coverage report
cetagostini Sep 14, 2023
c679b88
Merge remote-tracking branch 'upstream/main' into dev_budget_allocati…
cetagostini Sep 19, 2023
331971a
Increase ranges to find best_fit
cetagostini Sep 19, 2023
2af4f0a
Adding flexibility to estimation functions
cetagostini Sep 19, 2023
51afe7c
Small corrections
cetagostini Sep 19, 2023
975a89d
Correcting error on worflows
cetagostini Sep 28, 2023
cef8c86
Adding Experimental note
cetagostini Sep 29, 2023
1c4f093
Adding warnings.
cetagostini Sep 30, 2023
7562714
deleting by default values
cetagostini Sep 30, 2023
aeb7faf
Correcting position values
cetagostini Sep 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 121 additions & 18 deletions pymc_marketing/mmm/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
from sklearn.preprocessing import FunctionTransformer
from xarray import DataArray, Dataset

from pymc_marketing.mmm.budget_optimizer import budget_allocator
from pymc_marketing.mmm.utils import estimate_menten_parameters, michaelis_menten
from pymc_marketing.mmm.validating import (
ValidateChannelColumns,
ValidateDateColumn,
Expand Down Expand Up @@ -472,8 +474,105 @@ def compute_channel_contribution_original_scale(self) -> DataArray:
coords=channel_contribution.coords,
)

def plot_direct_contribution_curves(self) -> plt.Figure:
"""Plots the direct contribution curves. The term "direct" refers to the fact
def _plot_estimations(self, x: np.ndarray, channel: str, i: int) -> plt.Figure:
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved

channel_contributions = self.compute_channel_contribution_original_scale().mean(
["chain", "draw"]
)

fig_estimations, ax_estimations = plt.subplots(figsize=(8, 6))
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved

L, k = estimate_menten_parameters(channel, self.X, channel_contributions)
plateau_x = k * (0.99 * L / (L * 0.01))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use these 0.99 and 0.01 ?

Copy link
Contributor Author

@cetagostini cetagostini Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a practical solution because the Michaelis-Menten equation is given by: y = (k + x) / (L * x) where k represents the substrate concentration at which the y is half of L. In other words, when x = k and y = L/2. As x increases the equation approaches: y≈L.

As obvious because y becomes saturated; adding more x doesn't significantly increase y. The value L is an asymptote for the function, meaning the curve approaches L but never quite reaches it.

Using 0.99 and 0.01 calculates the x point when y is 99% of L. Simplified, it results in 99k.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do you want to add a shorter summary to the doc strings? :)

elbow_y = michaelis_menten(k, L, k)

x_fit = np.linspace(0, plateau_x - (max(x) * 2), 1000)
y_fit = michaelis_menten(x_fit, L, k)

ax_estimations.plot(x_fit, y_fit, color=f"C{i}", label="Fit Curve", alpha=0.6)

ax_estimations.plot(
k,
elbow_y,
"go",
color=f"C{i}",
markerfacecolor="white",
)

ax_estimations.set(xlabel="Spent", ylabel="Contribution")
ax_estimations.legend()

return fig_estimations

def budget_allocation(
self,
total_budget: int,
parameters: Optional[Dict[str, Tuple[float, float]]],
budget_bounds: Optional[Dict[str, Tuple[float, float]]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have Optional type it mean that they can be None ? Otherwise is not an Optional type. See https://mypy.readthedocs.io/en/stable/kinds_of_types.html

Optional[...] does not mean a function argument with a default value. It simply means that None is a valid value for the argument. This is a common confusion because None is a common default value for arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally correct, in this case, parameters should not be optional, only budget bounds. Change budget bounds to be optional and modify parameters based on this.

def budget_allocator(
    total_budget: int = 1000,
    channels: Union[List[str], Tuple[str, ...]] = [],
    parameters: Dict[str, Tuple[float, float]] = {},
    budget_ranges: Optional[Dict[str, Tuple[float, float]]] = None,
) -> DataFrame:

) -> pd.DataFrame:
"""
Allocate the budget optimally among different channels based on estimations and budget constraints.
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
total_budget : int, requiere
The total budget available for allocation.
parameters : dict, requiere
A DataFrame containing estimations and information about different channels.
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved
budget_bounds : dict, optional
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved
A dictionary specifying the budget bounds for each channel.

Returns
-------
Dict
A dictionary containing the allocated budget and contribution information.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns dict or data frame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.


Raises
------
ValueError
If any of the required parameters are not provided or have an incorrect type.
"""
if not isinstance(budget_bounds, dict):
raise ValueError("The 'budget_bounds' parameter must be a dictionary.")

if not isinstance(total_budget, (int, float)):
raise ValueError(
"The 'total_budget' parameter must be an integer or float."
)

return budget_allocator(
total_budget=total_budget,
channels=self.channel_columns,
parameters=parameters,
budget_ranges=budget_bounds,
)

def compute_channel_estimate_points_original_scale(self) -> Dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this method as compute_channel_plateat_points_original_scale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switch to compute_channel_curve_parameters_original_scale given the fact we can use a differents curve fit now. What do you think?

"""
Estimate optimal and plateau points for each channel.

Returns
-------
pd.DataFrame
A DataFrame with the estimated points.
"""
parameters = {}
channel_contributions = self.compute_channel_contribution_original_scale().mean(
["chain", "draw"]
)

for channel in self.channel_columns:
parameters[channel] = estimate_menten_parameters(
channel, self.X, channel_contributions
)

return parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use a simple dict comprehension

        return {
            channel: estimate_menten_parameters(
                channel, self.X, channel_contributions)
            for channel in self.channel_columns
        } 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!


def plot_direct_contribution_curves(
self, show_estimations: bool = False
) -> plt.Figure:
"""
Plots the direct contribution curves. The term "direct" refers to the fact
we plots costs vs immediate returns and we do not take into account the lagged
effects of the channels e.g. adstock transformations.

Expand All @@ -485,6 +584,7 @@ def plot_direct_contribution_curves(self) -> plt.Figure:
channel_contributions = self.compute_channel_contribution_original_scale().mean(
["chain", "draw"]
)

fig, axes = plt.subplots(
nrows=self.n_channel,
ncols=1,
Expand All @@ -496,24 +596,27 @@ def plot_direct_contribution_curves(self) -> plt.Figure:

for i, channel in enumerate(self.channel_columns):
ax = axes[i]

if self.X is not None:
sns.regplot(
x=self.X[self.channel_columns].to_numpy()[:, i],
y=channel_contributions.sel(channel=channel),
color=f"C{i}",
order=2,
ci=None,
line_kws={
"linestyle": "--",
"alpha": 0.5,
"label": "quadratic fit",
},
ax=ax,
)
ax.legend(loc="upper left")
ax.set(title=f"{channel}", xlabel="total_cost_eur")
x = self.X[self.channel_columns].to_numpy()[:, i]
y = channel_contributions.sel(channel=channel).to_numpy()

ax.scatter(x, y, label=f"{channel}", color=f"C{i}")

if show_estimations:
fig_estimations = self._plot_estimations(x, channel, i)
fig.append(fig_estimations)

ax.legend(
loc="upper left",
facecolor="white",
title=f"{channel} Legend",
fontsize="small",
)

ax.set(xlabel="Spent", ylabel="Contribution")

fig.suptitle("Contribution Plots", fontsize=16)
fig.suptitle("Direct response curves", fontsize=16)
return fig

def compute_mean_contributions_over_time(
Expand Down
111 changes: 111 additions & 0 deletions pymc_marketing/mmm/budget_optimizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# optimization_utils.py
from typing import Dict, List, Optional, Tuple, Union

import numpy as np
from pandas import DataFrame
from scipy.optimize import minimize

from pymc_marketing.mmm.utils import michaelis_menten


def calculate_expected_contribution(parameters, optimal_budget):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add return type hint and arguments type hint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

"""
Calculate the total expected contribution of budget allocations across various channels.
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
dict
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved
A dictionary with channels as keys and their respective contributions as values.
The key 'total' contains the total expected contribution.
"""

total_expected_contribution = 0
contributions = {}

for channel, budget in optimal_budget.items():
L, k = parameters[channel]
contributions[channel] = michaelis_menten(budget, L, k)
total_expected_contribution += contributions[channel]

contributions["total"] = total_expected_contribution

return contributions


def objective_distribution(x, channels, parameters):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add type hints

"""
Calculate the objective function value for a given budget distribution.

Parameters
----------
x : list of float
The budget distribution across channels.

Returns
-------
float
The value of the objective function given the budget distribution.
"""

sum_contributions = 0

for channel, budget in zip(channels, x):
L, k = parameters[channel]
sum_contributions += michaelis_menten(budget, L, k)

return -1 * sum_contributions


def optimize_budget_distribution(total_budget, budget_ranges, parameters, channels):
"""
Calculate the optimal budget distribution that minimizes the objective function.
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
dict
A dictionary with channels as keys and the optimal budget for each channel as values.
"""

if budget_ranges is None:
budget_ranges = {
channel: [0, min(total_budget, parameters[channel][0])]
for channel in channels
}

initial_guess = [total_budget / len(channels)] * len(channels)

bounds = [budget_ranges[channel] for channel in channels]

constraints = {"type": "eq", "fun": lambda x: np.sum(x) - total_budget}

result = minimize(
objective_distribution,
initial_guess,
args=(channels, parameters),
method="SLSQP",
bounds=bounds,
constraints=constraints,
)

return {channel: budget for channel, budget in zip(channels, result.x)}


def budget_allocator(
total_budget: int = 1000,
channels: Union[List[str], Tuple[str]] = [],
parameters: Optional[Dict[str, Tuple[float, float]]] = {},
budget_ranges: Optional[Dict[str, Tuple[float, float]]] = {},
juanitorduz marked this conversation as resolved.
Show resolved Hide resolved
) -> DataFrame:

optimal_budget = optimize_budget_distribution(
total_budget, channels, parameters, budget_ranges
)

return DataFrame(
{
"estimated_contribution": calculate_expected_contribution(
optimal_budget, parameters
),
"optimal_budget": optimal_budget,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the example notebook I get
image
Should we add the expected total value instead of NaN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, already working as expected.

Screenshot 2023-08-29 at 00 40 31

}
)
43 changes: 43 additions & 0 deletions pymc_marketing/mmm/utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
from typing import List

import numpy as np
import numpy.typing as npt
import pandas as pd
from scipy.optimize import curve_fit


def generate_fourier_modes(
Expand Down Expand Up @@ -33,3 +36,43 @@ def generate_fourier_modes(
for func in ("sin", "cos")
}
)


def michaelis_menten(x, L, k) -> float:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also useful as a dedicated saturation function in mmm.transformers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, checking Google MMM lightweight they had included several model configurations (Hill, Carryover, Adstock) which you could choose to define Saturation and Lagging.
Example:
Screenshot 2023-08-01 at 14 49 09

They refer to the following wiki where the saturation mentioned it is very similar to the Michaelis Menten. I could imagine we can implement something similar here, in order to be more extensive to how different existing data sets can fit better to different saturation and lagging functions depending on their own conditions.

Screenshot 2023-08-01 at 14 46 00

"""
Calculate the Michaelis-Menten function value.

Parameters
----------
x : float
The spent on a channel.
L : float
The maximum contribution a channel can make (also known as the plateau point).
k : float
The elbow on the function in `x` (Point where the curve change their direction)

Returns
-------
float
The value of the Michaelis-Menten function given the parameters.
"""

return L * x / (k + x)


def estimate_menten_parameters(
channel: str,
original_dataframe,
contributions,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints

) -> List[float]:

x = original_dataframe[channel].to_numpy()
y = contributions.sel(quantile=0.5).sel(channel=channel).to_numpy()

# Initial guess for L and k
initial_guess = [max(y), 0.001]
# Curve fitting
popt, pcov = curve_fit(michaelis_menten, x, y, p0=initial_guess)

# Save the parameters
return popt
52 changes: 52 additions & 0 deletions tests/mmm/test_budget_optimizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import pandas as pd
import pytest

from pymc_marketing.mmm.budget_optimizer import budget_allocator


@pytest.mark.parametrize(
"allocation_mode, total_budget, channels, parameters, budget_ranges, expected",
[
(
"growth",
1000,
["channel1", "channel2"],
{"channel1": 0.5, "channel2": 0.5},
{"channel1": (0, 1000), "channel2": (0, 1000)},
pd.DataFrame(
{
"estimated_contribution": {"channel1": 250, "channel2": 250},
"optimal_budget": {"channel1": 500, "channel2": 500},
}
),
),
(
"growth",
2000,
["channel1", "channel2", "channel3"],
{"channel1": 0.3, "channel2": 0.3, "channel3": 0.4},
{"channel1": (0, 1000), "channel2": (0, 1000), "channel3": (0, 1000)},
pd.DataFrame(
{
"estimated_contribution": {
"channel1": 300,
"channel2": 300,
"channel3": 400,
},
"optimal_budget": {
"channel1": 600,
"channel2": 600,
"channel3": 800,
},
}
),
),
],
)
def test_budget_allocator(
allocation_mode, total_budget, channels, parameters, budget_ranges, expected
):
result = budget_allocator(
allocation_mode, total_budget, channels, parameters, budget_ranges
)
pd.testing.assert_frame_equal(result, expected)
Loading
Loading