Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divergences in examples/generalized_linear_models/GLM-ordinal-regression.ipynb #649

Open
erik-werner opened this issue Mar 21, 2024 · 5 comments

Comments

@erik-werner
Copy link
Contributor

Regression Models with Ordered Categorical Outcomes:
https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/GLM-ordinal-regression.html:

Issue description

After make_model() is defined, five different models are created and sampled from. Unfortunately, models 1-2 have hundreds of divergences. Models 3-4 sample fine. Model 5 is identical to model 4, presumably the argument logit=False is missing. If I add this argument, 2 of 4 chains are completely stuck when sampling.

These divergences were not automatically reported because the numpyro sampler was used. However, since the most recent release we get warnings for divergences also for this sampler. 👍

Proposed solution

Fix the models so that they can be sampled from. Or if that turns out to be difficult, consider whether they can be removed. Or add a discussion of the problem with these models.

@erik-werner
Copy link
Contributor Author

erik-werner commented Mar 21, 2024

@NathanielF would you like to have a look at this? Or do you want me to see if I can fix it?

@NathanielF
Copy link
Contributor

Feel free to have a look @erik-werner ! Thanks for bringing it up

@NathanielF
Copy link
Contributor

Just a note here @erik-werner

Had a quick look. I think the code you want to run is :


def make_model(priors, model_spec=1, constrained_uniform=False, logit=True):
    with pm.Model() as model:
        if constrained_uniform:
            cutpoints = constrainedUniform(K, 0, K)
        else:
            sigma = pm.Exponential("sigma", priors["sigma"])
            cutpoints = pm.Normal(
                "cutpoints",
                mu=priors["mu"],
                sigma=sigma,
                transform=pm.distributions.transforms.univariate_ordered,
            )

        if model_spec == 1:
            beta = pm.Normal("beta", priors["beta"][0], priors["beta"][1], size=1)
            mu = pm.Deterministic("mu", beta[0] * df.salary)
        elif model_spec == 2:
            beta = pm.Normal("beta", priors["beta"][0], priors["beta"][1], size=2)
            mu = pm.Deterministic("mu", beta[0] * df.salary + beta[1] * df.work_sat)
        elif model_spec == 3:
            beta = pm.Normal("beta", priors["beta"][0], priors["beta"][1], size=3)
            mu = pm.Deterministic(
                "mu", beta[0] * df.salary + beta[1] * df.work_sat + beta[2] * df.work_from_home
            )
        if logit:
            y_ = pm.OrderedLogistic("y", cutpoints=cutpoints, eta=mu, observed=df.explicit_rating)
        else:
            y_ = pm.OrderedProbit("y", cutpoints=cutpoints, eta=mu, observed=df.explicit_rating)
        if not logit:
            idata = pm.sample(idata_kwargs={"log_likelihood": True})
        else:
            idata = pm.sample(nuts_sampler="numpyro", idata_kwargs={"log_likelihood": True})
        idata.extend(pm.sample_posterior_predictive(idata))
    return idata, model

priors = {"sigma": 1, "beta": [0, 1], "mu": np.linspace(0, K, K - 1)}
idata1, model1 = make_model(priors, model_spec=1)
idata2, model2 = make_model(priors, model_spec=2)
idata3, model3 = make_model(priors, model_spec=3)
idata4, model4 = make_model(priors, model_spec=3, constrained_uniform=True)
idata5, model5 = make_model(priors, model_spec=3, constrained_uniform=True, logit=False)


Models 1 and 2 are kind of underspecified so you will get divergences. The interesting contrasts are between the models 3 and 4 where we use the constrained uniform on the cutpoints. If you fix the fifth model to use the default sampler rather than numpyro you should be able to run the probit fit.

This is also arguably an interesting contrast just to see how the parameter values shift...
I think i wouldn't remove models 1 and 2 but i'd add text discussing how a better specified model avoids the issues with divergences that you've highlighted

image

@erik-werner
Copy link
Contributor Author

erik-werner commented Mar 25, 2024

@NathanielF These were basically my conclusions as well 🙂
One more thing: It seems like the divergences for models 1&2 occur when sigma is very close to 0, so in principle one could probably get rid of most divergences with a non-centered parametrisation. But that doesn't combine well with the ordered transform.

I wonder why the numpyro sampler cannot handle the probit model.

Anyway, since you have a better overview of the whole flow, and seem to have an idea about what to write, it would be great if you'd like to update the notebook.

@NathanielF
Copy link
Contributor

Sure thing @erik-werner. Will do.

On numpyro i think it has to do with the lack of parity between the jax implementation of ercf function in scipy.

Thanks again for bringing the issue to my attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants