CLV API Standardization #527

ColtAllen · 2024-02-09T19:23:25Z

There are API inconsistencies in the CLV module. Standardization is a big task best broken down into multiple PRs:

PRs to be Completed In-Order

1. Update BetaGeoModel predictive methods, retain legacy expected_num_purchases method
2. Update clv.plotting module so it no longer references expected_num_purchases
3. Update GammaGammaModel and clv.utils.customer_lifetime_value (related to Remove future_t=0 predictions from expected_customer_lifetime_value function #596)
4. Delete BetaGeoModel.expected_num_purchases and BetaGeoModel.expected_num_purchases_new_customer
5. Add BetaGeoNBD distribution block to clv.distributions (related to CLV Distribution RVs not Model-Specific #128)
6. Add covariate support to BetaGeoModel

Current API

Beta-Geo/NBD Transactions Model

rfm_data = pd.DataFrame(
            {
                "customer_id": customer_id,
                "frequency": frequency,
                "recency": recency,
                "T": T,
            }
        )

model = BetaGeoModel(data=rfm_df)
model.build_model()
model.fit()

model.expected_num_purchases(
    customer_id=rfm_data["customer_id"],
    t=10,
    frequency=rfm_data["frequency"],
    recency=rfm_data["recency"],
    T=rfm_data["T"],
)

Note how fit data is provided as a dataframe, but in the predictive methods individual arrays must be provided. Specifying one array at a time was one of the most annoying aspects about using the legacy lifetimes library, and sometimes even created indexing issues that caused the underlying scipy functions to crash.

For ParetoNBDModel, I streamlined this nonsense with a dataframe argument, and made it optional if running predictions on the fit dataset:

Pareto/NBD Transactions Model

rfm_data = pd.DataFrame(
            {
                "customer_id": customer_id,
                "frequency": frequency,
                "recency": recency,
                "T": T,
            }
        )

model = ParetoNBDModel(data=rfm_data)
model.build_model()
model.fit()

# Data param is optional and only required for out-of-sample data
model.expected_purchases(future_t=10)

model.expected_purchases(
    data=future_rfm_df,
    future_t=10,
)

(We will also need to resolve the naming inconsistencies between these models.)

I've been told passing in dataframes instead of arrays loses some xarray broadcasting functionality, which I'd be interested to hear more about. I'm not opposed to arrays being passed in provided it's optional for in-sample data.

The API discrepancies between these models necessitated a hotfix for the monetary value model, which follows the same conventions as BetaGeoModel:

Gamma/Gamma Monetary Value Model

monetary_data = pd.DataFrame(
            {
                "customer_id": customer_id,
                "mean_transaction_value": monetary_value,
                "frequency": frequency,
            }
        )

model = GammaGammaModel(data=monetary_data)
model.build_model()
model.fit()

model.expected_customer_lifetime_value(
    transaction_model=transaction_model,
    customer_id=rfm_data["customer_id"],
    mean_transaction_value=rfm_data["monetary_value"],
    frequency=rfm_data["frequency"],
    recency=rfm_data["recency"],
    T=rfm_data["T"],
    time=12,
    discount_rate=0.01,
    freq="W",
)

Lastly, ShiftedBetaGeoModelIndividual is a whole different animal since it handles contractual transactions, but I think it'd be a good idea to add support for it to the customer_lifetime_value utility:

Shifted Beta-Geo Contractual Model

contract_data = pd.DataFrame(
            {
                "customer_id": customer_id,
                "t_churn": t_churn,
                "T": T,
            }
        )

model = ShiftedBetaGeoModelIndividual(data=contract_data)
model.build_model()
model.fit()

model.distribution_customer_churn_time(customer_id=contract_data["customer_id"])

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2024-02-15T14:54:41Z

I think the best would be to work with xarray datasets. It has the organization benefits of pandas, with the broadcasting behavior of numpy. Internally most predictive methods are already written with xarray code anyway. Users could pass pandas dataframes and we convert to xarray, but the default type which needs to conversion would be xarrays.

Definitely agree that passing separate numpy arrays is too cumbersome

ColtAllen · 2024-05-21T16:44:56Z

Updated original comment with list of PRs to complete.

wd60622 · 2024-05-22T16:47:35Z

Is t / future_t meant to be vector / vectorized in the API? I think previous implementation had it as vector of same size as each other input or scalar

ColtAllen · 2024-05-22T21:25:26Z

Is t / future_t meant to be vector / vectorized in the API? I think previous implementation had it as vector of same size as each other input or scalar

Both forms of parametrization (vectorized or scalar) are supported:

# scalar parametrization (here predictions are being ran for in-sample data)
model.expected_purchases(future_t=10)

# equivalent vectorized parametrization
data = data.assign(future_t=10)
model.expected_purchases(data)

Vectorization support was added to facilitate xarray inputs in the future.

ColtAllen · 2024-06-19T00:18:38Z

Steps 4-6 (along with adding CLV support for ShiftedBetaGeoModelIndividual) are extraneous and will be given their own issues after #758 is merged.

ColtAllen added CLV request discussion maintenance priority: medium major API breaking changes labels Feb 9, 2024

ColtAllen self-assigned this Feb 9, 2024

This was referenced Feb 9, 2024

Add Time-Invariant Covariates to CLV Models #134

Open

Add Additional Expressions to sBG Model #167

Open

ricardoV94 mentioned this issue Mar 1, 2024

Refactor CLV build_model logic #564

Merged

14 tasks

This was referenced Mar 18, 2024

Return Xarrays in CLV RFM Utility Functions? #593

Open

Fix clv plotting bugs and edits to Quickstart #601

Merged

ColtAllen mentioned this issue May 20, 2024

WIP: Add expected cumulative transaction func #692

Closed

13 tasks

ColtAllen mentioned this issue Jun 1, 2024

Update BetaGeoModel API #709

Merged

12 tasks

This was referenced Jun 9, 2024

Fix ParetoNBDModel Docstring Typos #727

Closed

CLV Plotting API #728

Merged

ColtAllen added this to the 0.7.0 milestone Jun 13, 2024

ColtAllen mentioned this issue Jun 19, 2024

GammaGammaModel Docstrings and API Standardization #758

Merged

12 tasks

juanitorduz removed this from the 0.7.0 milestone Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLV API Standardization #527

CLV API Standardization #527

ColtAllen commented Feb 9, 2024 •

edited

Loading

ricardoV94 commented Feb 15, 2024

ColtAllen commented May 21, 2024

wd60622 commented May 22, 2024

ColtAllen commented May 22, 2024

ColtAllen commented Jun 19, 2024

CLV API Standardization #527

CLV API Standardization #527

Comments

ColtAllen commented Feb 9, 2024 • edited Loading

PRs to be Completed In-Order

Current API

Beta-Geo/NBD Transactions Model

Pareto/NBD Transactions Model

Gamma/Gamma Monetary Value Model

Shifted Beta-Geo Contractual Model

ricardoV94 commented Feb 15, 2024

ColtAllen commented May 21, 2024

wd60622 commented May 22, 2024

ColtAllen commented May 22, 2024

ColtAllen commented Jun 19, 2024

ColtAllen commented Feb 9, 2024 •

edited

Loading