Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

meraldoantonio
Copy link
Contributor

@meraldoantonio meraldoantonio commented May 23, 2024

Reference Issues/PRs

#7

What does this implement/fix? Explain your changes.

This WIP PR implements a Bayesian Linear Regressor with PyMC as a backend

Does your contribution introduce a new dependency? If yes, which one?

Yes - it depends on PyMC family: PyMC itself, XArray and ArviZ

What should a reviewer concentrate their feedback on?

The design of the BayesianLinearRegressor. Especially:

  1. The introduction of the priors. For now, the class hardcodes the priors. We need to think about the way in which the users should inject their own priors.

Did you add any tests for the change?

Not yet

Any other comments?

N/A

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • [ X] The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators

(This is not yet done)

  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@meraldoantonio meraldoantonio marked this pull request as draft May 23, 2024 17:11
fkiraly added a commit that referenced this pull request May 25, 2024
This PR removes the legacy base modules.

* base class: equivalent functionality is now contained in
`BaseDistribution`, `BaseProbaRegressor`, `_DelegatedProbaRegressor`
* pymc vendor interface: currently worked on in
#358
* density estimation: tracked via
#7
# Priors for unknown model parameters
self.intercept = pm.Normal("intercept", mu=self.intercept_mu, sigma=self.intercept_sigma)
self.slopes = pm.Normal("slopes", mu=self.slopes_mu, sigma=self.slopes_sigma, shape = self._X.shape[1], dims=("pred_id"))
self.noise = pm.HalfNormal("noise", sigma=self.noise_sigma)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would inverse gamma not be more standard here, as it is conjugate to the normal?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 4, 2024

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

@meraldoantonio
Copy link
Contributor Author

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

Apparently there is a bug with Arviz 0.17 and scipy>=1.13 (source 1) (source 2).

The bug is no longer present in Arviz 0.18 but this requires Python 3.10 and above.

As a temporary solution, I've locked the scipy version in all-extras in project.toml

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2024

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

@meraldoantonio
Copy link
Contributor Author

meraldoantonio commented Oct 6, 2024

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

Makes sense! But I've tried this a couple of times and for some reason, without the pyproject.toml lock, the test framework keeps installing the "wrong" version of scipy (version 1.13.1), even after specifying "scipy<=1.12.0" in the python_dependencies tag...

It might be that other libraries are pulling in a conflicting version, but I haven't managed to find the exact cause..

Any ideas?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 7, 2024

Any ideas?

Why are you trying to bound scipy instead of arviz? I would simply bound arviz>=0.18, based on your statements, as well as python_version >= 3.10

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 7, 2024

PS: why did you close the notebook PR? That was a nice notebook, and indeed it would be nice as separate PR.

@meraldoantonio
Copy link
Contributor Author

Any ideas?

Why are you trying to bound scipy instead of arviz? I would simply bound arviz>=0.18, based on your statements, as well as python_version >= 3.10

Ah ok! I was trying to make it work for python 3.9. I've now locked the python version and arviz version in the latest commit

PS: why did you close the notebook PR? That was a nice notebook, and indeed it would be nice as separate PR.

Just closed it temporarily as I'm still working on adding the update method. I’ll re-open it once that's done!

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 8, 2024

is this PR ready for merge?

I would really suggest to chunk PRs in smaller, self-contained additions, so we can merge your contributions more quickly and PR do not get too large. For instance, update we can add later and it should not "hurt" (or would it?)

@meraldoantonio
Copy link
Contributor Author

meraldoantonio commented Oct 9, 2024

is this PR ready for merge?

Yes it is!

I would really suggest to chunk PRs in smaller, self-contained additions, so we can merge your contributions more quickly and PR do not get too large. For instance, update we can add later and it should not "hurt" (or would it?)

Sure, noted! Yes, you're right, we can add the update method in later PRs

Also, I've documented the logic behind this class in this PR, could you please take a look at this when you have time? Thank you!

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 10, 2024

is this PR ready for merge?

Yes it is!

Then you should adjust the tagging/signposting:

  • remove the "WIP" tag from the title
  • turn the PR form draft to ready (top right side menu)
  • "re-request a review", press circling arrows next to dev pictures who have already reviewed

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 10, 2024

PS: there is a conflict in the pyproject.toml, as your PR does not seem up to date with the recent release. Fixing is as usual: sync your fork, and merge main into your branch.

@meraldoantonio meraldoantonio changed the title [ENH] (WIP) Creating a new Bayesian Regressor with PyMC as a backend [ENH] Creating a new Bayesian Regressor with PyMC as a backend Oct 10, 2024
@meraldoantonio meraldoantonio marked this pull request as ready for review October 10, 2024 15:39
@meraldoantonio
Copy link
Contributor Author

PS on the update method: Based on my research, there isn't an ideal method for performing update using PyMC. There are 2 possible routes, however, both have significant limitations:

  1. Rebuilding the prior from the posterior using marginal 1D empirical approximation: This approach, described in the PyMC documentation, works fine when dealing with a single variable. However, if we have multiple variables (such as in our case, where we have an intercept and a slope per feature), this method is problematic because it loses the correlation structure between variables.

  2. Rebuilding the prior using a multivariate normal approximation: Another approach is to approximate the posterior as a multivariate normal distribution, as discussed in the PyMC Experimental library. While this method retains the correlation between variables, it assumes normality, which may not hold in cases where the posterior is multimodal or skewed.

Instead of introducing a potentially misleading update method, I think it would be better to omit it at this stage

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great!

There are some testing issues:

  • the failure on 3.9 still seems there, even though we clearly set the tag. Could you check if you can understand what is going on there?
  • get_test_params should be populated with at least two examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement module:regression probabilistic regression module
Projects
Status: PR in progress
Development

Successfully merging this pull request may close these issues.

2 participants