Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add predict_sklearn method #535

Closed
wants to merge 3 commits into from

Conversation

markgoodhead
Copy link
Contributor

An example PR for what a predict_sklearn method would look like on the model object (not for merging/review, just for discussion)

@markgoodhead
Copy link
Contributor Author

Example script of use:

import bambi as bmb
import numpy as np
import pandas as pd

rng = np.random.default_rng(0)

size = 2_000
x = rng.normal(size=size)
data = pd.DataFrame(
    {
        "x": x,
        "y": rng.normal(loc=x, size=size)
    }
)

bmb_model = bmb.Model("y ~ x", data)
idata = bmb_model.fit()

test_size = 100
test_x = rng.normal(size=test_size)
test_data = pd.DataFrame(
    {
        "x": test_x,
        "y": rng.normal(loc=test_x, size=test_size)
    }
)

predictions = bmb_model.predict_sklearn(idata, "y", test_data)

@tomicapretto
Copy link
Collaborator

Thanks for moving this issue so quickly. I would like to add the following.

  • I'm in favor of having a utility to predict using point estimates. I'm sure many will find this useful.
  • I'm not sure if this should be a method or a function.
    • I hesitate because I don't know how good is to have two predict* methods.
  • I'm not in favor of using predict_sklearn as the name. I know it could be super clear for those coming with a sklearn background, but I think it will be confusing for all the rest. I think predict_point_estimate is a better name because it's clearer, although it's quite long.

Apart from that, if it is a method in Model we don't need the name of the variable we want to predict. We can recover it from the model itself. We can use self.response.name instead of predict_variable in return idata.posterior_predictive[predict_variable].mean(("chain", "draw")).values

@aloctavodia
Copy link
Collaborator

I still think that the best approach is to show how to work with InferenceData and not add new functions/method for this case. And if necessary improve InferenceData.

But following this proposal. I agree that having two predict methods is not good design. Having and utility function could work. Maybe something like compute_point, or aggregate_predictions or something similar. Or hopefully a more obvious name. It should be clear from the name that the function works with predictions. Otherwise people may want to use it to compute point estimates from posteriors parameters or whatever.

Yet another option is to add options or arguments to predict. But that will probably make that method a bit clunky and less clean than now.

What other point estimates we want? mean and median seems like immediate candidates, but what about other stuff like trimmed mean (or similar)...

@aloctavodia
Copy link
Collaborator

aloctavodia commented Jun 13, 2022

BTW, i think most if not all the friction could be solved by adding a scikitlearn migration guide, or intro to bambi for scikitlearn users.

@aloctavodia
Copy link
Collaborator

Hi @markgoodhead what do you think about my proposal of writing a scikitlearn migration guide? Are you interested in working on it?

@tomicapretto
Copy link
Collaborator

It seems we don't have interest in implementing such a method. It would be nice to have a migration guide for those coming from scikit learn. See #729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants