Allow copy and deepcopy of PYMC models #7492

Dekermanjian · 2024-09-05T00:22:10Z

Description

I added __copy__ and __deepcopy__ methods to the Model class. Both will use the clone_model() method from pymc.model.fgraph and if any guassian process variables are detected in the model an exception is raised. I also created 2 unit tests to test both that the methods work and that the exception is raised upon detecting a gaussian process variable

Related Issue

Closes Implement __copy__ and __deepcopy__ model methods #6985
Related to Do not reference model variables in GP internals #6883

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7492.org.readthedocs.build/en/7492/

welcome · 2024-09-05T00:22:14Z

]
💖 Thanks for opening this pull request! 💖 The PyMC community really appreciates your time and effort to contribute to the project. Please make sure you have read our Contributing Guidelines and filled in our pull request template to the best of your ability.

Dekermanjian · 2024-09-06T10:36:50Z

Hey @ricardoV94, sorry if this is a silly question. Do I need to do anything to get the all_tests and mypy to start running?

ricardoV94 · 2024-09-06T11:03:51Z

CI has to be manually approved for first time contributors, doing it now

Dekermanjian · 2024-09-06T11:04:58Z

Okay! Thank you very much!!

codecov · 2024-09-06T11:38:21Z

Codecov Report

Attention: Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.44%. Comparing base (2856062) to head (d057a9d).
Report is 16 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc/model/core.py	66.66%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7492      +/-   ##
==========================================
+ Coverage   92.15%   92.44%   +0.28%     
==========================================
  Files         103      103              
  Lines       17208    17119      -89     
==========================================
- Hits        15858    15825      -33     
+ Misses       1350     1294      -56

Files with missing lines	Coverage Δ
pymc/model/fgraph.py	`97.94% <100.00%> (+0.03%)`	⬆️
pymc/model/core.py	`91.82% <66.66%> (+0.07%)`	⬆️

... and 26 files with indirect coverage changes

ricardoV94 · 2024-09-07T17:56:34Z

pymc/model/core.py

+        check_for_gp_vars = [
+            k for x in ["_rotated_", "_hsgp_coeffs_"] for k in self.named_vars.keys() if x in k
+        ]
+        if len(check_for_gp_vars) > 0:
+            raise Exception("Unable to clone Gaussian Process Variables")


This should be in the low-level utility used by clone_model, fgraph_from_model? Also I think it should be a warning, because we are not sure something with _rotated_ is a GP or just a user variable name for a vanilla variable.

Also @bwengals any other names we could look for to detect GPs in the model? This is not perfect, but avoiding as much surprise as possible would be great.

@ricardoV94 Yes, you are absolutely correct. This should be a warning and yes, I can move it to the lower-level utility used by clone_model. Do you know if there is a way that I can check model variables to discern the type of the variable it is? Something like type(model.named_vars) and it would say it was a GP

Nope, no way to know it was a GP, that's the unfortunate bit. GPs produce vanilla MvNormals or deterministics

Sorry for the very delayed response. No, there isn't. What I've done to smuggle GPs out with the model in the past is to do something like:

with pm.Model() as model: # model code here gp1 = pm.gp.Latent(...) gp2 = pm.gp.HSGP(...) model.gps = {"gp1": gp1, "gp2": gp2, ...}

If GPs added themselves to the model context automatically they could be tracked. They don't have names though, so need a little thought for how to key that dictionary, although maybe putting them in a list is OK.

Hey @bwengals, if I understand you correctly, you are saying that we can add GPs to the Model class as an attribute and that would happen in core.py in the section after line 539 and then we can pull those variables just like the rest of the variables in fgraph_from_model() in the section after line 169 inside of the fgraph.py file.
(sorry for the line number references I couldn't figure out how to highlight the code in the comment. I recently just started contributing to open source)

ricardoV94 · 2024-09-07T17:58:12Z

tests/model/test_core.py

@@ -1761,3 +1762,52 @@ def test_graphviz_call_function(self, var_names, filenames) -> None:
                figsize=None,
                dpi=300,
            )
+
+
+class TestModelCopy(unittest.TestCase):


We use pytest, any reason we need to subclass unittest?

Oh, I must have made a mistake. I am sorry I am new to using pytest. I will fix that as well!

pymc/model/core.py

ricardoV94 · 2024-09-07T18:02:30Z

@Dekermanjian looks great! I left some suggestions, let me know if something does not make sense

Dekermanjian · 2024-09-07T19:47:14Z

Thank you @ricardoV94 for your feedback! I appreciate it!

ricardoV94 · 2024-09-08T17:20:16Z

pymc/model/fgraph.py

@@ -391,6 +393,11 @@ def clone_model(model: Model) -> Model:
            z = pm.Deterministic("z", clone_x + 1)

    """
+    check_for_gp_vars = [


should be in the even lower fgraph_from_model

Oh! my bad, I will fix that now. Thank you @ricardoV94!

…ed doc example, updated pytest

…h_from_model

Dekermanjian · 2024-09-24T10:16:17Z

Hey @ricardoV94, is there anything you'd like me to modify/add to this pull request?

ricardoV94 · 2024-09-26T17:13:39Z

pymc/model/fgraph.py

+    check_for_gp_vars = [
+        k for x in ["_rotated_", "_hsgp_coeffs_"] for k in model.named_vars.keys() if x in k
+    ]
+    if len(check_for_gp_vars) > 0:
+        warnings.warn("Unable to clone Gaussian Process Variables", UserWarning)


This can be simplified

Suggested change

check_for_gp_vars = [

k for x in ["_rotated_", "_hsgp_coeffs_"] for k in model.named_vars.keys() if x in k

]

if len(check_for_gp_vars) > 0:

warnings.warn("Unable to clone Gaussian Process Variables", UserWarning)

if any(name in ("_rotated_", "_hsgp_coeffs_") for name in model.named_vars):

warnings.warn("Unable to clone Gaussian Process Variables", UserWarning)

Yeah, that looks better. I will make that change. Thank you!

Oh, hey @ricardoV94. Since we moved that check to the fgraph_from_model() function maybe the warning should say something different instead of mentioning cloning? Is this function only used for cloning?

Hey, one more thing. I wasn't able to get the above suggestion to work. I think that we need to check if the string subset ("rotated" & "hsgp_coeffs") are in the name of the variable. I was able to get it to work by making a little change:

if any(gp_name in var_name for gp_name in ["_rotated_", "_hsgp_coeffs_"] for var_name in model.named_vars): warnings.warn("Unable to clone Gaussian Process Variables", UserWarning)

Is that okay with you?

Oh sorry of course. Since there are only two options I would do an or statement then: if '_rotated_' in var_name or '_hsgp_coeffs_' in var_name for var_name in ...

Hey @ricardoV94, I couldn't figure out how to get it to work by only looping through model.named_vars one time like you have in your example. I had to do a loop in each one of the or cases like this:

if any("_rotated_" in var_name for var_name in model.named_vars) or any("_hsgp_coeffs_" in var_name for var_name in model.named_vars): warnings.warn("Unable to clone Gaussian Process Variables", UserWarning)

Am I missing something?

if any(("_rotated_" in var_name or "_hsgp_coeffs_" in var_name) for var_name in model.named_vars)

Also let's give a more informative warning. Something like: "Detected variables likely created by GP objects. Further use of these old GP objects should be avoided as it may reintroduce variables from the old model. See issue: https://github.com/pymc-devs/pymc/issues/6883".

Thanks @ricardoV94, I have made the suggestions and pushed the changes.

ricardoV94 · 2024-09-26T17:14:19Z

pymc/model/fgraph.py

@@ -369,7 +377,7 @@ def clone_model(model: Model) -> Model:

    Recreates a PyMC model with clones of the original variables.
    Shared variables will point to the same container but be otherwise different objects.
-    Constants are not cloned.
+    Constants are not cloned and if guassian process variables are detected then a warning will be triggered.


Let's not mention it, this is just a temporary thing we want to fix

Suggested change

Constants are not cloned and if guassian process variables are detected then a warning will be triggered.

Constants are not cloned.

Okay, I will make that change as well!

…ntation for clone_model function

Dekermanjian · 2024-09-29T01:03:11Z

Thank you @ricardoV94 for your help with this and for your patience. I really appreciate it.

ricardoV94

Hi, I had another pass and I have a couple minor suggestions. I think this should be it!

Thanks for your patience as well 🙏

ricardoV94 · 2024-09-29T03:49:37Z

tests/model/test_core.py

+            pm.Normal("y", f * 2)
+        return gp_model
+
+    def test_copy_model(self) -> None:


These tests have a lot of duplicated code that we can avoid if you use pytest.mark.parametrize("copy_method", (copy.copy, copy.deepcopy))

ricardoV94 · 2024-09-29T03:50:19Z

tests/model/test_core.py

+
+class TestModelCopy:
+    @staticmethod
+    def simple_model() -> pm.Model:


No need to use staticmethods as these aren't used in more than one single test?

ricardoV94 · 2024-09-29T03:51:57Z

tests/model/test_core.py

+        deepcopy_simple_model = copy.deepcopy(simple_model)
+
+        with simple_model:
+            simple_model_prior_predictive = pm.sample_prior_predictive(random_seed=42)


Taking a single draw should be enough.

I would also test that adding a deterministic to the copy model does not introduce one in the original model (basically the example you had in the docstrings)

pymc/model/core.py

ricardoV94 · 2024-09-29T03:59:09Z

tests/model/test_core.py

+
+    def test_guassian_process_copy_failure(self) -> None:
+        gaussian_process_model = self.gp_model()
+        with pytest.warns(UserWarning):


Add a match kwarg to check the UserWarning is actually the one we care about

Suggested change

with pytest.warns(UserWarning):

with pytest.warns(UserWarning match=...):

…inistics to clone model, added copy method to Model class

Dekermanjian · 2024-09-29T10:55:46Z

Hey Ricardo, I made the suggested changes. The tests look a lot better now. Thank you for reviewing it!

ricardoV94 · 2024-09-30T09:46:32Z

pymc/model/core.py

+        Clone a pymc model by overiding the python copy method using the clone_model method from fgraph.
+        Constants are not cloned and if guassian process variables are detected then a warning will be triggered.


Suggested change

Clone a pymc model by overiding the python copy method using the clone_model method from fgraph.

Constants are not cloned and if guassian process variables are detected then a warning will be triggered.

Clone the model.

To access variables in the cloned model use `cloned_model["var_name"]`.

ricardoV94 · 2024-09-30T09:48:28Z

tests/model/test_core.py

+        simple_model_prior_predictive_mean = simple_model_prior_predictive["prior"]["y"].mean(
+            ("chain", "draw")
+        )
+        copy_simple_model_prior_predictive_mean = copy_simple_model_prior_predictive["prior"][
+            "y"
+        ].mean(("chain", "draw"))


No need to take them mean, now that it's a single value. Just retrieve it with simple_model_prior_preictive.prior["y"].values

ricardoV94 · 2024-09-30T09:49:09Z

tests/model/test_core.py

+            copy_method(gaussian_process_model)
+
+    @pytest.mark.parametrize("copy_method", (copy.copy, copy.deepcopy))
+    def test_adding_deterministics_to_clone(self, copy_method) -> None:


This check can be done in the first test. That way you can also confirm the prior_predictive["z"] draw is what you expect (and only exists in the cloned model)

ricardoV94 · 2024-09-30T09:49:38Z

tests/model/test_core.py

+            "y"
+        ].mean(("chain", "draw"))
+
+        assert np.isclose(


You can check exact equality, since the draws are exactly the same

Dekermanjian · 2024-09-30T12:34:05Z

Hey Ricardo, I made the changes you suggested. There is one part in the test I wasn't sure if there is a simpler way to do. When I sample a prior predictive, if I include the deterministic the first values for y between the copy model and the original model would not be the same. So I have to sample prior predictives twice one without the deterministic and another time with the deterministic. Anyway that can be simplified?

ricardoV94

Very minor nitpicks to reduce test verbosity.

ricardoV94 · 2024-09-30T12:39:51Z

tests/model/test_core.py

+        with copy_simple_model:
+            z = pm.Deterministic("z", copy_simple_model["alpha"] + 1)
+            copy_simple_model_prior_predictive = pm.sample_prior_predictive(
+                samples=1, random_seed=42
+            )


You can do this above, and call sample_prior_predictive only once for the copy_simple_model

ricardoV94 · 2024-09-30T12:40:24Z

tests/model/test_core.py

+        simple_model_prior_predictive_val = simple_model_prior_predictive["prior"]["y"].values
+        copy_simple_model_prior_predictive_val = copy_simple_model_prior_predictive["prior"][


Just compare directly, no need to assign to separate variables, that are almost as verbose as the way they are accessed

ricardoV94 · 2024-09-30T12:41:56Z

It seems I keep trying to make the tests smaller. Very sorry! Let me know if it feels too much :)

ricardoV94 · 2024-09-30T13:12:55Z

So I have to sample prior predictives twice one without the deterministic and another time with the deterministic. Anyway that can be simplified?

Ah... I didn't expect that, it shouldn't be the case in theory. Let me have a look, but we can leave as is if true

Dekermanjian · 2024-09-30T21:22:25Z

It seems I keep trying to make the tests smaller. Very sorry! Let me know if it feels too much :)

Not a problem at all. It is for the best to remove any redundancies.

Dekermanjian · 2024-09-30T21:24:30Z

So I have to sample prior predictives twice one without the deterministic and another time with the deterministic. Anyway that can be simplified?

Ah... I didn't expect that, it shouldn't be the case in theory. Let me have a look, but we can leave as is if true

I tried this again. When I sample prior predictive with the copy model that includes the deterministic the variable 'y' value no longer matches the value from the original model and the assert statement doesn't pass.

ricardoV94 · 2024-10-01T05:52:36Z

You're right. I was testing with a simpler model and it worked fine. With multiple random variables the seeding will act differently. But with a single random variable its fine. I guess we can use a single RV for this example, since we are not even checking the others?

import pymc as pm
from pymc.model.fgraph import clone_model
with pm.Model() as m:
    x = pm.Normal("x")
    print(pm.sample_prior_predictive(draws=1, random_seed=1).prior["x"].values)     
    
with clone_model(m) as new_m:
    y = pm.Deterministic("y", new_m["x"] + 1)
    print(pm.sample_prior_predictive(draws=1, random_seed=1).prior["x"].values)

Sampling: [x]
[[-0.64031853]]
Sampling: [x]
[[-0.64031853]]

Dekermanjian · 2024-10-01T10:28:18Z

import pymc as pm
from pymc.model.fgraph import clone_model
with pm.Model() as m:
    x = pm.Normal("x")
    print(pm.sample_prior_predictive(draws=1, random_seed=1).prior["x"].values)     
    
with clone_model(m) as new_m:
    y = pm.Deterministic("y", new_m["x"] + 1)
    print(pm.sample_prior_predictive(draws=1, random_seed=1).prior["x"].values)

Hmm, I can get your example to work locally but it still gives me a different value for my example:

Dekermanjian · 2024-10-01T10:39:07Z

I also notice that if my deterministic is a function of alpha or error it will change the value of the prior predictive. However, if my deterministic is a function of the likelihood (like in your example) then the prior values are exactly the same.

ricardoV94 · 2024-10-02T11:15:23Z

Yes, I said so above, but maybe it was not obvious. With multiple random variables, the deterministic ends up changing which seed goes to which variable (after they are split internally). But your test does not need the extra random variables right? The simpler, single random variable should be enough.

It doesn't matter if it's observed or not for prior_predictive.

Dekermanjian · 2024-10-02T12:06:00Z

Yes, I said so above, but maybe it was not obvious. With multiple random variables, the deterministic ends up changing which seed goes to which variable (after they are split internally). But your test does not need the extra random variables right? The simpler, single random variable should be enough.

It doesn't matter if it's observed or not for prior_predictive.

Okay, this makes sense. This is because PyTensor gives each RV a unique seed correct? You are correct for the purposes of testing we don't need extra RVs. I made the simplification and pushed the changes. Thank you, Ricardo.

ricardoV94 · 2024-10-02T15:20:50Z

Thanks a ton @Dekermanjian 😊

Dekermanjian · 2024-10-02T21:37:23Z

@ricardoV94 Happy to help! :)

Ricardo, I have a quick question for you. I am trying to learn the lower level code for PyTensor so that I can contribute in a more meaningful way. I have walked through the PyTensor Dev tutorials but the tutorials are a little bit terse. I am wondering if there are more resources to learning the low level code. Maybe an example somewhere that shows how a simple PYMC model is constructed without using the high level abstractions. So defining the graph, the logp, and the sampling algorithm from scratch. Is something like that available?

ricardoV94 · 2024-10-03T09:18:11Z

@Dekermanjian have you seen this guide? https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/pymc_pytensor.html

I have actually been thinking about doing a series on implementing the core of PyMC in a live stream, which you might have liked if it already existed ? :)

Dekermanjian · 2024-10-03T10:11:52Z

https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/pymc_pytensor.html

@ricardoV94 I have not seen that one before! Thank you this will be very helpful.

have actually been thinking about doing a series on implementing the core of PyMC in a live stream, which you might have > liked if it already existed ? :)

Yes, that would be fantastic. PYMC is wonderful for users because of the abstractions but for those who are interested in contributing development efforts it is difficult to connect the pieces just by going through the code base.

PRs pymc-devs#7508 and pymc-devs#7492 introduced incompatible changes but were not tested simultaneously. Deepcopying the steps in the tests leads to deepcopying the model which uses `clone_model`, which in turn does not support initvals.

PRs #7508 and #7492 introduced incompatible changes but were not tested simultaneously. Deepcopying the steps in the tests leads to deepcopying the model which uses `clone_model`, which in turn does not support initvals.

ricardoV94 reviewed Sep 7, 2024

View reviewed changes

ricardoV94 added the enhancements label Sep 7, 2024

ricardoV94 reviewed Sep 7, 2024

View reviewed changes

pymc/model/core.py Show resolved Hide resolved

ricardoV94 reviewed Sep 8, 2024

View reviewed changes

Dekermanjian added 4 commits September 9, 2024 05:02

added __copy__ and __deepcopy__ methods to Model and added unit tests

b34742d

added docs to methods

fe4e0c5

changed raise to warning, moved warning to low level clone_graph, add…

bcb4309

…ed doc example, updated pytest

moved gaussian process checking from clone_model to lower level fgrap…

33c5766

…h_from_model

ricardoV94 reviewed Sep 26, 2024

View reviewed changes

simplified conditional for GP variables in fgraph and reverted docume…

88fde25

…ntation for clone_model function

Dekermanjian force-pushed the model_copy_method branch from 9b031dd to 88fde25 Compare September 29, 2024 00:16

ricardoV94 requested changes Sep 29, 2024

View reviewed changes

parametrized tests to be more efficient, added test for adding determ…

90419cb

…inistics to clone model, added copy method to Model class

ricardoV94 reviewed Sep 30, 2024

View reviewed changes

updated copy method docs and simplified TestModelCopy tests

07106ec

ricardoV94 reviewed Sep 30, 2024

View reviewed changes

shortened TestModelCopy by removing assigned variables before comparison

fb00f85

simplified model in TestModelCopy

d057a9d

ricardoV94 approved these changes Oct 2, 2024

View reviewed changes

ricardoV94 changed the title ~~PYMC Model copy and deepcopy override methods~~ Allow copy and deepcopy of PYMC models Oct 2, 2024

ricardoV94 merged commit cdcdb58 into pymc-devs:main Oct 3, 2024
22 checks passed

ricardoV94 mentioned this pull request Oct 8, 2024

Do not use initval in test model #7529

Merged

	Constants are not cloned and if guassian process variables are detected then a warning will be triggered.
	Constants are not cloned.

	with pytest.warns(UserWarning):
	with pytest.warns(UserWarning match=...):

		Clone a pymc model by overiding the python copy method using the clone_model method from fgraph.
		Constants are not cloned and if guassian process variables are detected then a warning will be triggered.

		simple_model_prior_predictive_val = simple_model_prior_predictive["prior"]["y"].values
		copy_simple_model_prior_predictive_val = copy_simple_model_prior_predictive["prior"][

Allow copy and deepcopy of PYMC models #7492

Allow copy and deepcopy of PYMC models #7492

Conversation

Dekermanjian commented Sep 5, 2024 • edited by ricardoV94 Loading

Description

Related Issue

Checklist

Type of change

welcome bot commented Sep 5, 2024

Dekermanjian commented Sep 6, 2024

ricardoV94 commented Sep 6, 2024

Dekermanjian commented Sep 6, 2024

codecov bot commented Sep 6, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 commented Sep 7, 2024

Dekermanjian commented Sep 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dekermanjian commented Sep 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Sep 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dekermanjian commented Sep 29, 2024

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Sep 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dekermanjian commented Sep 29, 2024

ricardoV94 Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dekermanjian commented Sep 30, 2024

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 commented Sep 30, 2024

ricardoV94 commented Sep 30, 2024

Dekermanjian commented Sep 30, 2024

Dekermanjian commented Sep 30, 2024

ricardoV94 commented Oct 1, 2024 • edited Loading

Dekermanjian commented Oct 1, 2024

Dekermanjian commented Oct 1, 2024

ricardoV94 commented Oct 2, 2024 • edited Loading

Dekermanjian commented Oct 2, 2024

ricardoV94 commented Oct 2, 2024

Dekermanjian commented Oct 2, 2024

ricardoV94 commented Oct 3, 2024

Dekermanjian commented Oct 3, 2024

Dekermanjian commented Sep 5, 2024 •

edited by ricardoV94

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading

ricardoV94 Sep 27, 2024 •

edited

Loading

ricardoV94 Sep 28, 2024 •

edited

Loading

ricardoV94 Sep 29, 2024 •

edited

Loading

ricardoV94 Sep 30, 2024 •

edited

Loading

ricardoV94 commented Oct 1, 2024 •

edited

Loading

ricardoV94 commented Oct 2, 2024 •

edited

Loading