Add predict_sklearn method #535

markgoodhead · 2022-06-13T13:07:03Z

An example PR for what a predict_sklearn method would look like on the model object (not for merging/review, just for discussion)

markgoodhead · 2022-06-13T13:10:53Z

Example script of use:

import bambi as bmb
import numpy as np
import pandas as pd

rng = np.random.default_rng(0)

size = 2_000
x = rng.normal(size=size)
data = pd.DataFrame(
    {
        "x": x,
        "y": rng.normal(loc=x, size=size)
    }
)

bmb_model = bmb.Model("y ~ x", data)
idata = bmb_model.fit()

test_size = 100
test_x = rng.normal(size=test_size)
test_data = pd.DataFrame(
    {
        "x": test_x,
        "y": rng.normal(loc=test_x, size=test_size)
    }
)

predictions = bmb_model.predict_sklearn(idata, "y", test_data)

tomicapretto · 2022-06-13T21:38:55Z

Thanks for moving this issue so quickly. I would like to add the following.

I'm in favor of having a utility to predict using point estimates. I'm sure many will find this useful.
I'm not sure if this should be a method or a function.
- I hesitate because I don't know how good is to have two predict* methods.
I'm not in favor of using predict_sklearn as the name. I know it could be super clear for those coming with a sklearn background, but I think it will be confusing for all the rest. I think predict_point_estimate is a better name because it's clearer, although it's quite long.

Apart from that, if it is a method in Model we don't need the name of the variable we want to predict. We can recover it from the model itself. We can use self.response.name instead of predict_variable in return idata.posterior_predictive[predict_variable].mean(("chain", "draw")).values

aloctavodia · 2022-06-13T23:27:48Z

I still think that the best approach is to show how to work with InferenceData and not add new functions/method for this case. And if necessary improve InferenceData.

But following this proposal. I agree that having two predict methods is not good design. Having and utility function could work. Maybe something like compute_point, or aggregate_predictions or something similar. Or hopefully a more obvious name. It should be clear from the name that the function works with predictions. Otherwise people may want to use it to compute point estimates from posteriors parameters or whatever.

Yet another option is to add options or arguments to predict. But that will probably make that method a bit clunky and less clean than now.

What other point estimates we want? mean and median seems like immediate candidates, but what about other stuff like trimmed mean (or similar)...

aloctavodia · 2022-06-13T23:31:43Z

BTW, i think most if not all the friction could be solved by adding a scikitlearn migration guide, or intro to bambi for scikitlearn users.

aloctavodia · 2022-08-06T21:17:04Z

Hi @markgoodhead what do you think about my proposal of writing a scikitlearn migration guide? Are you interested in working on it?

tomicapretto · 2023-09-26T11:43:37Z

It seems we don't have interest in implementing such a method. It would be nice to have a migration guide for those coming from scikit learn. See #729

mark added 2 commits June 13, 2022 14:04

Add predict_sklearn method

05eb0dc

Run black

4e288c2

markgoodhead mentioned this pull request Jun 13, 2022

Surprising predict() behaviour #533

Closed

Use human-readable indexes for xarray object

209992d

tomicapretto mentioned this pull request Sep 26, 2023

Add migration guide for those coming from scikit learn #729

Open

tomicapretto closed this Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add predict_sklearn method #535

Add predict_sklearn method #535

markgoodhead commented Jun 13, 2022

markgoodhead commented Jun 13, 2022

tomicapretto commented Jun 13, 2022

aloctavodia commented Jun 13, 2022

aloctavodia commented Jun 13, 2022 •

edited

Loading

aloctavodia commented Aug 6, 2022

tomicapretto commented Sep 26, 2023

Add predict_sklearn method #535

Add predict_sklearn method #535

Conversation

markgoodhead commented Jun 13, 2022

markgoodhead commented Jun 13, 2022

tomicapretto commented Jun 13, 2022

aloctavodia commented Jun 13, 2022

aloctavodia commented Jun 13, 2022 • edited Loading

aloctavodia commented Aug 6, 2022

tomicapretto commented Sep 26, 2023

aloctavodia commented Jun 13, 2022 •

edited

Loading