Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical differences between 3.11.4 and 4.0.0b #5443

Closed
seanaedmiston opened this issue Feb 2, 2022 · 12 comments · Fixed by #6879
Closed

Numerical differences between 3.11.4 and 4.0.0b #5443

seanaedmiston opened this issue Feb 2, 2022 · 12 comments · Fixed by #6879

Comments

@seanaedmiston
Copy link

seanaedmiston commented Feb 2, 2022

Numerical problems with quadratic approximation

As per @ericmjl (pymc-devs/pymc-resources#168 (comment)), pymc 3.11.4 and 4.0.0b are giving different numerical answers to quadratic approximation examples for Chapter 2 in the Rethinking Statistics course. Below is an excerpt from the Chapter 2 example.

pymc 3.11.4 gives standard deviation of 0.16 (correct). 4.0.0b gives 0.64. No other warnings or errors were apparent.

Please provide a minimal, self-contained, and reproducible example.

#!/usr/bin/env python
# coding: utf-8

import numpy as np
import pymc3 as pm

RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)

# #### Code 2.6
# 
# Computing the posterior using the quadratic approximation (quad).

data = np.repeat((0, 1), (3, 6))
with pm.Model() as normal_approximation:
    p = pm.Uniform("p", 0, 1)  # uniform priors
    w = pm.Binomial("w", n=len(data), p=p, observed=data.sum())  # binomial likelihood
    mean_q = pm.find_MAP()
    std_q = ((1 / pm.find_hessian(mean_q, vars=[p])) ** 0.5)[0]

# display summary of quadratic approximation
print("  Mean, Standard deviation\np {:.2}, {:.2}".format(mean_q["p"], std_q[0]))

Expected Output

  Mean, Standard deviation
p 0.67, 0.16

Actual Output

  Mean, Standard deviation
p 0.67, 0.64

Please provide any additional information below.

Versions and main components

  • PyMC/PyMC3 Version: 4.0.0b
  • Aesara/Theano Version:
  • Python Version: 3.9.7
  • Operating system: MacOS
  • How did you install PyMC/PyMC3: conda (using environments.yml in rethinking_2)
@twiecki
Copy link
Member

twiecki commented Feb 3, 2022

This is concerning. @ricardoV94 any ideas where this could stem from?

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 3, 2022

It seems to have something to do with the variable transform. This provides the same results in V3 and V4:

import numpy as np
import pymc as pm

data = np.repeat((0, 1), (3, 6))
with pm.Model() as m:
    p = pm.Uniform("p", 0, 1, transform=None)  # uniform priors
    w = pm.Binomial("w", n=len(data), p=p, observed=data.sum())  # binomial likelihood
m.compile_d2logp(vars=[p], jacobian=False)({'p': 0.66})
# array([[39.72566178]])  # V3 prints array([[39.72566178]])

But with the default transform we get nonsensical (?) results:

data = np.repeat((0, 1), (3, 6))
with pm.Model() as m:
    p = pm.Uniform("p", 0, 1)  # uniform priors
    w = pm.Binomial("w", n=len(data), p=p, observed=data.sum())  # binomial likelihood
m.compile_d2logp(vars=[p], jacobian=False)({'p_interval__': 0.69})
# array([[2.00209481]])  # V3 prints array([[40.41538223]])

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 3, 2022

So the difference is that in V3, the hessian is computed in terms of p, and in V4 it's computed in terms of p_interval__.
In V3 you can get both results like this:

data = np.repeat((0, 1), (3, 6))
with pm.Model() as m:
    p = pm.Uniform("p", 0, 1)  # uniform priors
    w = pm.Binomial("w", n=len(data), p=p, observed=data.sum())  # binomial likelihood
    mean_q = pm.find_MAP()
    std_q1 = ((1 / pm.find_hessian(mean_q, vars=[m["p"]])) ** 0.5)[0]
    std_q2 = ((1 / pm.find_hessian(mean_q, vars=[m["p_interval__"]])) ** 0.5)[0]
print(mean_q)  # {'p_interval__': array(0.69314718), 'p': array(0.66666667)}
print(std_q1, std_q2)  # [0.15713484] [0.63960215]

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 3, 2022

AFAICT this is not possible in V4 because what in V3 was m["p"] does not exist in the Model object. It exists somewhere in the d/logp graph but we don't have a handle to it, only to m["p_interval__"]. As such we can't easily ask the gradient with respect to it.

Having said that, V4 behavior is internally consistent, as the value variable of p is defined on the interval scale. Removing the transformation of p when the model is defined would provide the correct answers (but probably find_MAP would fail)

@twiecki
Copy link
Member

twiecki commented Feb 3, 2022

Is this fixed by #5444?

@ricardoV94
Copy link
Member

Nope, it's unrelated

@ricardoV94
Copy link
Member

ricardoV94 commented Feb 5, 2022

If you wanted to do this right now you could remove the transform after calling find_MAP:

import numpy as np
import pymc as pm

data = np.repeat((0, 1), (3, 6))
with pm.Model() as m:
    p = pm.Uniform("p", 0, 1)  # uniform priors
    w = pm.Binomial("w", n=len(data), p=p, observed=data.sum())  # binomial likelihood
    mean_q = pm.find_MAP()

    # Remove transform from the variable `p`
    m.rvs_to_transforms[p] = None

    # Change name so that we can use `mean_q["p"]` value
    p_value = m.rvs_to_values[p]
    p_value.name = p.name
    
    std_q = ((1 / pm.find_hessian(mean_q, vars=[p])) ** 0.5)[0]

    # display summary of quadratic approximation
    print("  Mean, Standard deviation\np {:.2}, {:.2}".format(mean_q["p"], std_q[0]))

#   Mean, Standard deviation
# p 0.67, 0.16

This is another argument for binding transforms less strongly to the model variables. For instance by keeping them in a dictionary of {rv: transforms} similar to the {rv: initial_value} dictionary we have going on now (related to #5033), and telling users they can switch this after the fact.

In the future, we might be able to make transforms explicit part of the RandomVariable graph (see aesara-devs/aeppl#119), but it will always be a bit tricky to make a clear API to say:

  1. I want to compute something based on the transformed variable (which was the case for the find_MAP)
  2. I want to compute something based on the untransformed variable (which was the case for find_hessian)

Specially because users are often unaware of the transformations, and most of the times it's unwise to use the untransformed space (case in point, find_MAP fails completely without the transform here).

I would say the original example was a happy accident, where we choose the right representation for find_MAP and the representation that the user expected for find_hessian. Another user might have been surprised as to why the result of the hessian was not given in terms of the transformed variable.

@danhphan
Copy link
Member

danhphan commented Apr 17, 2022

Hi @ricardoV94, is there any update for this one.

It seems that I face a similar issue of pm.find_MAP when converting this notebook into PyMC v4.

The detail is here: pymc-devs/pymc-examples#73 (comment) Thanks

@ivaquero
Copy link

ivaquero commented Mar 21, 2023

Hi @ricardoV94, I tired your code, but it popped up an error, how to fix this?

I'm using PyMC 5.1 and macOS M1.

     9 mean_q = pm.find_MAP()
     11 # Remove transform from the variable `p`
---> 12 normal_approximation.rvs_to_tranforms[p] = None
     14 # Change name so that we can use `mean_q["p"]` value
     15 p_value = normal_approximation.rvs_to_values[p]

AttributeError: 'Model' object has no attribute 'rvs_to_tranforms'

If you wanted to do this right now you could remove the transform after calling find_MAP:

import numpy as np
import pymc as pm

data = np.repeat((0, 1), (3, 6))
with pm.Model() as m:
    p = pm.Uniform("p", 0, 1)  # uniform priors
    w = pm.Binomial("w", n=len(data), p=p, observed=data.sum())  # binomial likelihood
    mean_q = pm.find_MAP()

    # Remove transform from the variable `p`
    m.rvs_to_tranforms[p] = None

    # Change name so that we can use `mean_q["p"]` value
    p_value = m.rvs_to_values[p]
    p_value.name = p.name
    
    std_q = ((1 / pm.find_hessian(mean_q, vars=[p])) ** 0.5)[0]

    # display summary of quadratic approximation
    print("  Mean, Standard deviation\np {:.2}, {:.2}".format(mean_q["p"], std_q[0]))

#   Mean, Standard deviation
# p 0.67, 0.16

This is another argument for binding transforms less strongly to the model variables. For instance by keeping them in a dictionary of {rv: transforms} similar to the {rv: initial_value} dictionary we have going on now (related to #5033), and telling users they can switch this after the fact.

In the future, we might be able to make transforms explicit part of the RandomVariable graph (see aesara-devs/aeppl#119), but it will always be a bit tricky to make a clear API to say:

  1. I want to compute something based on the transformed variable (which was the case for the find_MAP)
  2. I want to compute something based on the untransformed variable (which was the case for find_hessian)

Specially because users are often unaware of the transformations, and most of the times it's unwise to use the untransformed space (case in point, find_MAP fails completely without the transform here).

I would say the original example was a happy accident, where we choose the right representation for find_MAP and the representation that the user expected for find_hessian. Another user might have been surprised as to why the result of the hessian was not given in terms of the transformed variable.

@ricardoV94
Copy link
Member

ricardoV94 commented Mar 21, 2023

There was a typo, should be rvs_to_transforms. Updated now

@ivaquero
Copy link

Thanks, surprised that I didn't find that.

@ricardoV94
Copy link
Member

ricardoV94 commented Sep 4, 2023

I am closing this via #6879. There is a new functionality pymc.model.transform.conditioninig.remove_value_transforms that make it easier to evaluate a model logp/hessian on untransformed space. These functions include as docstrings the example discussed in this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants