[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

meraldoantonio · 2024-05-23T17:11:20Z

Reference Issues/PRs

#7

What does this implement/fix? Explain your changes.

This WIP PR implements a Bayesian Linear Regressor with PyMC as a backend

Does your contribution introduce a new dependency? If yes, which one?

Yes - it depends on PyMC family: PyMC itself, XArray and ArviZ

What should a reviewer concentrate their feedback on?

The design of the BayesianLinearRegressor. Especially:

The introduction of the priors. For now, the class hardcodes the priors. We need to think about the way in which the users should inject their own priors.

Did you add any tests for the change?

Not yet

Any other comments?

N/A

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
[ X] The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

(This is not yet done)

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

review-notebook-app · 2024-05-23T17:11:25Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

skpro/regression/bayesian.py

This PR removes the legacy base modules. * base class: equivalent functionality is now contained in `BaseDistribution`, `BaseProbaRegressor`, `_DelegatedProbaRegressor` * pymc vendor interface: currently worked on in #358 * density estimation: tracked via #7

skpro/regression/bayesian.py

…ons have diff. behaviors wrt tensor mutability)

fkiraly · 2024-06-06T19:14:12Z

skpro/regression/bayesian.py

+            # Priors for unknown model parameters
+            self.intercept = pm.Normal("intercept", mu=self.intercept_mu, sigma=self.intercept_sigma)
+            self.slopes = pm.Normal("slopes", mu=self.slopes_mu, sigma=self.slopes_sigma, shape = self._X.shape[1], dims=("pred_id"))
+            self.noise = pm.HalfNormal("noise", sigma=self.noise_sigma)


would inverse gamma not be more standard here, as it is conjugate to the normal?

fkiraly · 2024-10-04T17:48:18Z

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

meraldoantonio · 2024-10-05T15:35:55Z

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

Apparently there is a bug with Arviz 0.17 and scipy>=1.13 (source 1) (source 2).

The bug is no longer present in Arviz 0.18 but this requires Python 3.10 and above.

As a temporary solution, I've locked the scipy version in all-extras in project.toml

fkiraly · 2024-10-05T16:10:11Z

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

…ressor dependencies

meraldoantonio · 2024-10-06T15:43:23Z

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

Makes sense! But I've tried this a couple of times and for some reason, without the pyproject.toml lock, the test framework keeps installing the "wrong" version of scipy (version 1.13.1), even after specifying "scipy<=1.12.0" in the python_dependencies tag...

It might be that other libraries are pulling in a conflicting version, but I haven't managed to find the exact cause..

Any ideas?

fkiraly · 2024-10-07T00:36:02Z

Any ideas?

Why are you trying to bound scipy instead of arviz? I would simply bound arviz>=0.18, based on your statements, as well as python_version >= 3.10

fkiraly · 2024-10-07T00:36:54Z

PS: why did you close the notebook PR? That was a nice notebook, and indeed it would be nice as separate PR.

meraldoantonio · 2024-10-08T16:10:58Z

Any ideas?

Why are you trying to bound scipy instead of arviz? I would simply bound arviz>=0.18, based on your statements, as well as python_version >= 3.10

Ah ok! I was trying to make it work for python 3.9. I've now locked the python version and arviz version in the latest commit

PS: why did you close the notebook PR? That was a nice notebook, and indeed it would be nice as separate PR.

Just closed it temporarily as I'm still working on adding the update method. I’ll re-open it once that's done!

fkiraly · 2024-10-08T20:35:04Z

is this PR ready for merge?

I would really suggest to chunk PRs in smaller, self-contained additions, so we can merge your contributions more quickly and PR do not get too large. For instance, update we can add later and it should not "hurt" (or would it?)

meraldoantonio · 2024-10-09T16:06:10Z

is this PR ready for merge?

Yes it is!

I would really suggest to chunk PRs in smaller, self-contained additions, so we can merge your contributions more quickly and PR do not get too large. For instance, update we can add later and it should not "hurt" (or would it?)

Sure, noted! Yes, you're right, we can add the update method in later PRs

Also, I've documented the logic behind this class in this PR, could you please take a look at this when you have time? Thank you!

fkiraly · 2024-10-10T10:09:51Z

is this PR ready for merge?

Yes it is!

Then you should adjust the tagging/signposting:

remove the "WIP" tag from the title
turn the PR form draft to ready (top right side menu)
"re-request a review", press circling arrows next to dev pictures who have already reviewed

fkiraly · 2024-10-10T10:10:27Z

PS: there is a conflict in the pyproject.toml, as your PR does not seem up to date with the recent release. Fixing is as usual: sync your fork, and merge main into your branch.

meraldoantonio · 2024-10-10T16:13:04Z

PS on the update method: Based on my research, there isn't an ideal method for performing update using PyMC. There are 2 possible routes, however, both have significant limitations:

Rebuilding the prior from the posterior using marginal 1D empirical approximation: This approach, described in the PyMC documentation, works fine when dealing with a single variable. However, if we have multiple variables (such as in our case, where we have an intercept and a slope per feature), this method is problematic because it loses the correlation structure between variables.
Rebuilding the prior using a multivariate normal approximation: Another approach is to approximate the posterior as a multivariate normal distribution, as discussed in the PyMC Experimental library. While this method retains the correlation between variables, it assumes normality, which may not hold in cases where the posterior is multimodal or skewed.

Instead of introducing a potentially misleading update method, I think it would be better to omit it at this stage

fkiraly

The code looks great!

There are some testing issues:

the failure on 3.9 still seems there, even though we clearly set the tag. Could you check if you can understand what is going on there?
get_test_params should be populated with at least two examples.

Ubuntu and others added 2 commits May 22, 2024 02:44

Added pymc as an optional dependency in pyproject.toml

8bfa908

Added WIP Bayesian Linear Regression code and notebook

4634059

meraldoantonio marked this pull request as draft May 23, 2024 17:11

fkiraly reviewed May 23, 2024

View reviewed changes

skpro/regression/bayesian.py Outdated Show resolved Hide resolved

meraldoantonio added 2 commits May 23, 2024 17:14

Added comments

be4d12a

Changed MutableData to Data

054a7df

meraldoantonio mentioned this pull request May 24, 2024

[MENTEE] Meraldo Antonio sktime/mentoring#43

Open

fkiraly assigned meraldoantonio May 24, 2024

fkiraly mentioned this pull request May 25, 2024

[MNT] remove legacy base modules #80

Merged

Fixed typo in pyproject.toml

eb54e18

fkiraly reviewed May 31, 2024

View reviewed changes

skpro/regression/bayesian.py Outdated Show resolved Hide resolved

fkiraly reviewed May 31, 2024

View reviewed changes

skpro/regression/bayesian.py Outdated Show resolved Hide resolved

meraldoantonio added 15 commits June 6, 2024 03:28

BayesianLinearRegressor fitted to skpro template, fit and predict work

eebfb67

Finished predict_proba method in BayesianLinearRegressor

2bacfa0

Fixed indexing bugs

1a373ea

Deleted template comments

3f02932

Added an example in the docstring

4b7cca4

Added a visualize_model method

b0f89c4

Added mutability=True in pm.Data

41857d1

Pinned the version of pymc in pyproject.toml to 5.15.0 (earlier versi…

a3b4c64

…ons have diff. behaviors wrt tensor mutability)

Removed mutable=True in pm.Data which is to be deprecated

bea9481

Added get_prior method

0016acf

Added get posterior method

c43f836

Added methods to return prior and posterior summary statistics

e4d0933

Added plot_ppc method

a150c5c

Deleted old sample notebook

b0b4bd1

Added example notebook

45698aa

fkiraly reviewed Jun 6, 2024

View reviewed changes

Meraldo Antonio added 2 commits October 4, 2024 20:44

Removed the dictionary piping syntax and python 3.10 requirements

89c0f6d

Removed the dictionary piping syntax and python 3.10 requirements

55249f4

Meraldo Antonio added 5 commits October 5, 2024 23:06

Commented out graphviz visualization fucntion

434f086

Locked Arviz version

5c886b1

Downgraded scipy version

b4a2e4a

modified scipy version to 1.12 in project toml due to arviz bug

21102b2

Uncommented visualize model

0b3408d

meraldoantonio mentioned this pull request Oct 5, 2024

[DOC] (WIP) Notebook companion to BayesianLinearRegressor (#358) #474

Closed

2 tasks

Removed scipy lock from pyproject toml and added to BayesianLienarReg…

535ca75

…ressor dependencies

Locked Python and Arviz version

a3aea79

Added default value of None to prior_config and sampler_config

fd4d5fd

meraldoantonio mentioned this pull request Oct 9, 2024

[DOC] Notebook companion to BayesianLinearRegressor #480

Open

2 tasks

Merged main, resolved conflict

1f0402e

meraldoantonio changed the title ~~[ENH] (WIP) Creating a new Bayesian Regressor with PyMC as a backend~~ [ENH] Creating a new Bayesian Regressor with PyMC as a backend Oct 10, 2024

meraldoantonio marked this pull request as ready for review October 10, 2024 15:39

meraldoantonio requested a review from fkiraly October 10, 2024 15:40

Locked python version for pymc

b7f27aa

fkiraly requested changes Oct 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

meraldoantonio commented May 23, 2024 •

edited by fkiraly

Loading

review-notebook-app bot commented May 23, 2024

fkiraly Jun 6, 2024

fkiraly commented Oct 4, 2024

meraldoantonio commented Oct 5, 2024

fkiraly commented Oct 5, 2024

meraldoantonio commented Oct 6, 2024 •

edited

Loading

fkiraly commented Oct 7, 2024 •

edited

Loading

fkiraly commented Oct 7, 2024

meraldoantonio commented Oct 8, 2024

fkiraly commented Oct 8, 2024

meraldoantonio commented Oct 9, 2024 •

edited

Loading

fkiraly commented Oct 10, 2024

fkiraly commented Oct 10, 2024

meraldoantonio commented Oct 10, 2024

fkiraly left a comment

[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

Are you sure you want to change the base?

[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

Conversation

meraldoantonio commented May 23, 2024 • edited by fkiraly Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

For new estimators

review-notebook-app bot commented May 23, 2024

fkiraly Jun 6, 2024

Choose a reason for hiding this comment

fkiraly commented Oct 4, 2024

meraldoantonio commented Oct 5, 2024

fkiraly commented Oct 5, 2024

meraldoantonio commented Oct 6, 2024 • edited Loading

fkiraly commented Oct 7, 2024 • edited Loading

fkiraly commented Oct 7, 2024

meraldoantonio commented Oct 8, 2024

fkiraly commented Oct 8, 2024

meraldoantonio commented Oct 9, 2024 • edited Loading

fkiraly commented Oct 10, 2024

fkiraly commented Oct 10, 2024

meraldoantonio commented Oct 10, 2024

fkiraly left a comment

Choose a reason for hiding this comment

meraldoantonio commented May 23, 2024 •

edited by fkiraly

Loading

meraldoantonio commented Oct 6, 2024 •

edited

Loading

fkiraly commented Oct 7, 2024 •

edited

Loading

meraldoantonio commented Oct 9, 2024 •

edited

Loading