Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordinal regression docs #719

Merged
merged 13 commits into from
Sep 28, 2023
Merged

Conversation

GStechschulte
Copy link
Collaborator

@GStechschulte GStechschulte commented Sep 13, 2023

This draft PR contains a notebook with an ordinal regression model for the newly added cumulative family.

First, it is explained why ordered categorical outcomes require special treatment. Secondly, the basics of ordinal regression are explained, motivated through the use of the cumulative link function. Next, an intercept only model is fit to explain how the cumulative link function "describes" an ordered distribution. Lastly, a model with predictors is developed.

I will also add a section with the sratio family.

Edit by Tomas Capretto: Closes #583

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@tomicapretto
Copy link
Collaborator

tomicapretto commented Sep 15, 2023

Hi @GStechschulte! I recommend to rebase so it runs the latest version of the CI :)

Btw, is there anything I can do here to help you?

@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Sep 15, 2023

Hi @GStechschulte! I recommend to rebase so it runs the latest version of the CI :)

Btw, is there anything I can do here to help you?

Hey, and will do! Thanks :) Not necessarily. Although, there is a problem with ordinal models and the interpret sub-package. Since an ordinal model prediction is a vector of probabilities of length $K$, we get shape errors when attempting to use any of the functions in interpret 😵‍💫.

I need to see if marginaleffects supports ordinal models, and see if we can adopt their solution.

@tomicapretto
Copy link
Collaborator

Hi @GStechschulte! I recommend to rebase so it runs the latest version of the CI :)
Btw, is there anything I can do here to help you?

Hey, and will do! Thanks :) Not necessarily. Although, there is a problem with ordinal models and the interpret sub-package. Since an ordinal model prediction is a vector of probabilities of length K, we get shape errors when attempting to use any of the functions in interpret 😵‍💫.

I need to see if marginaleffects supports ordinal models, and see if we can adopt their solution.

Isn't it similar to the case of the "categorical" family?

@GStechschulte
Copy link
Collaborator Author

@tomicapretto yes, it should be similar to the categorical family.

@GStechschulte GStechschulte force-pushed the ordinal-examples branch 2 times, most recently from d06e32d to fb1724a Compare September 19, 2023 18:57
@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Sep 19, 2023

I think this PR is almost there. However, I am still questioning the interpretation of the threshold coefficients for the sratio model. I will paste my Slack message here.

From my understanding, for ordinal models with the sratio family and logit link, the interpretation of the threshold coefficients are not cumulative-logits, but rather logits. This is in part because the ordering of the $k$ thresholds does not matter since each outcome is characterised by its own latent distribution. Whereas ordinal models with family=cumulative and link=logit, the coefficients of the thresholds are cumulative-logits.

However, when I plot the logits of the coefficients for the sratio model, it appears as though they are “partially” cumulative (overall, the probability increases as category increases, but for some categories the probability decreases which is not possible under a cumulative specification).

Plotting the PyMC graph for the sratio model, there are no constraints.

@GStechschulte GStechschulte marked this pull request as ready for review September 20, 2023 14:02
@tomicapretto
Copy link
Collaborator

Hi Gabriel! This is a fantastic example, it's not simple at all and you're making a great job. I left some comments with suggestions and thoughts. Also:

  • Can you silence FutureWarnings?
  • Is the warning about the usage of the C-API making the model(s) run slower?
  • I'm starting to think the prior for the threshold in the StoppingRatio family is not the most adequate. The means are sorted, but there's no reason to do that. So I'm thinking it makes more sense to have something like Normal([0, 0, 0..., 0]) instead of what we have now here
    mu = np.round(np.linspace(-2, 2, num=response_level_n - 1), 2)

    what do you think about this last point?

@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Sep 22, 2023

Thank you! It has been a fun notebook to implement 😄

  • Can you silence FutureWarnings?

Done.

  • Is the warning about the usage of the C-API making the model(s) run slower?

Yes, it "feels" like it. I haven't ran any timed experiments though.

  • I'm starting to think the prior for the threshold in the StoppingRatio family is not the most adequate. The means are sorted, but there's no reason to do that. So I'm thinking it makes more sense to have something like Normal([0, 0, 0..., 0]) instead of what we have now here
    mu = np.round(np.linspace(-2, 2, num=response_level_n - 1), 2)

Agreed, since the ordering of the thresholds doesn't matter, doing something like mu = np.zeros(response_level_n - 1) makes more sense. I will try it out. What does brms do?

Update:

Changing the default threshold prior:

response_level_n = len(attrition["YearsAtCompany"].unique())
mu = np.zeros(response_level_n - 1)
threshold_prior = {"threshold": bmb.Prior("Normal", mu=mu, sigma=1)}

sequence_model = bmb.Model(
    "YearsAtCompany ~ 0 + TotalWorkingYears", 
    data=attrition, 
    family="sratio", 
    priors=threshold_prior
)

sequence_model
       Formula: YearsAtCompany ~ 0 + TotalWorkingYears
        Family: sratio
          Link: p = logit
  Observations: 1233
        Priors: 
    target = p
        Common-level effects
            TotalWorkingYears ~ Normal(mu: 0.0, sigma: 0.3223)
        
        Auxiliary parameters
            threshold ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
             0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.], sigma: 1.0)

works as expected. For now, I am thinking I keep this and explain how to change the default prior for the thresholds. Then, we could open a separate PR? What do you think?

@tomicapretto
Copy link
Collaborator

@GStechschulte

For now, I am thinking I keep this and explain how to change the default prior for the thresholds. Then, we could open a separate PR? What do you think?

I know it's more "tidy" to do it in a separate PR, but I think it's OK to do it in this one too. Anyway, I leave it up to you. Whatever you want is OK!

1 similar comment
@tomicapretto
Copy link
Collaborator

@GStechschulte

For now, I am thinking I keep this and explain how to change the default prior for the thresholds. Then, we could open a separate PR? What do you think?

I know it's more "tidy" to do it in a separate PR, but I think it's OK to do it in this one too. Anyway, I leave it up to you. Whatever you want is OK!

@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Sep 22, 2023

I implemented the zero mu vector in this PR 😄 and added a section in the notebook explaining the differences in the default priors for cumulative and sratio.

@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2023

Codecov Report

Merging #719 (5c57e55) into main (e53f8da) will not change coverage.
Report is 2 commits behind head on main.
The diff coverage is 100.00%.

❗ Current head 5c57e55 differs from pull request most recent head 075f537. Consider uploading reports for the commit 075f537 to get more accurate results

@@           Coverage Diff           @@
##             main     #719   +/-   ##
=======================================
  Coverage   89.56%   89.56%           
=======================================
  Files          44       44           
  Lines        3525     3525           
=======================================
  Hits         3157     3157           
  Misses        368      368           
Files Coverage Δ
bambi/priors/scaler.py 96.70% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -114,7 +114,7 @@ def scale_threshold(self):
threshold = self.model.components["threshold"]
if isinstance(threshold, ConstantComponent) and threshold.prior.auto_scale:
response_level_n = len(np.unique(self.response_component.response_term.data))
mu = np.round(np.linspace(-2, 2, num=response_level_n - 1), 2)
mu = np.zeros(response_level_n - 1)
Copy link
Collaborator Author

@GStechschulte GStechschulte Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sequential models assume that for every response level there is a latent continuous variable $Z_k$, then wouldn't we need each response level? Thus, mu should be mu = response_level_n and not response_level_n - 1?

Copy link
Collaborator

@tomicapretto tomicapretto Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I think you're right!

I'm thinking why the current approach is working and not failing. Is it because it's not considering the probability of Y being larger than the largest observed category? I think it would make sense for the years example, but I'm not sure if it would make sense for cases where there is a pre-specified set of categories.

I wrote that as I was looking at this visualization from the ordinal tutorial by Bürkner and Vuorre:

image

and I wonder: Do we always have a Pr(Y > K)? (as in the Y > 3 in the figure)

@GStechschulte if you make that modification and run the example, does it work?

Copy link
Collaborator Author

@GStechschulte GStechschulte Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the -1, I get ValueError: Incompatible Elemwise input shapes [(35,), (36,)].

This makes sense as I stated in the docs because the sequential model is a product of probabilities, i.e., the probability that $Y$ is equal to category $k$ is equal to the probability that it did not fall in one of the former categories $1: k-1$ multiplied by the probability that the sequential process stopped at $k$.

In the case of the attrition dataset, there are 36 response categories. Because of the statement above, this makes sense why the probability of category 36 is 1. There is no category after 36, so once you multiply all of the previous probabilities with the current category, you get 1. Thus, you don't need a parameter (threshold) for it.

@tomicapretto
Copy link
Collaborator

tomicapretto commented Sep 25, 2023

@GStechschulte on top of the comment that you raised, there are two nits. After that, feel free to merge. Excellent work!!

edit: closed by accident haha

@GStechschulte
Copy link
Collaborator Author

Git got me good on this one 😢

@tomicapretto
Copy link
Collaborator

I'm not sure if I follow what happened. Do you need help?

@GStechschulte GStechschulte merged commit eaae9c5 into bambinos:main Sep 28, 2023
1 of 4 checks passed
@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Sep 28, 2023

I'm not sure if I follow what happened. Do you need help?

No haha. My local branch somehow diverged from this remote branch and I attempted to fix the conflicts manually. At the end, the easiest was to force push. By the way, thanks for the reviews!

GStechschulte added a commit to GStechschulte/bambi that referenced this pull request Oct 3, 2023
* ordinal model with cumulative link notebook

* ordinal model with cumulative link function

ordinal models (cumulative and sratio)

* unified explanation for cumulative and sequential models

* sratio model and data

* code review changes

* remove intercept in models

* zero mu vector prior for sratio family

* code review and add section on default priors

* explicit explanation of K and k and added summary section

* Zero inflated docs (bambinos#725)

* zero inflated poisson and hurdle poisson models

* grammar fix and sort imports

* interpret coeff. and model comparison section

* code review changes

* change wording in hurdle Poisson section

* change posterior predictive bins to use np.arange

* ordinal model with cumulative link function

ordinal models (cumulative and sratio)

* use plot_ppc_discrete for posterior predictive samples

* add plots explaining the ordinal outcome of the dataset

---------

Co-authored-by: Gabriel Stechschulte <[email protected]>
@GStechschulte GStechschulte deleted the ordinal-examples branch January 21, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ordinal models example
3 participants