Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add summary #11

Open
OriolAbril opened this issue Jun 21, 2024 · 4 comments
Open

Add summary #11

OriolAbril opened this issue Jun 21, 2024 · 4 comments

Comments

@OriolAbril
Copy link
Member

Add a summary function.

It might help to have a helper to "compile" multiple functions into a single ufunc with tuple output. But before diving too much into this a quick benchmark should be done to make sure it gives some advantage

@aloctavodia
Copy link
Contributor

aloctavodia commented Jul 16, 2024

It's probably a good time to rethink the summary function. I am mostly happy with our current approach, but we should think about how to accommodate at least some of these metrics too (like min_ss) http://mc-stan.org/posterior/articles/pareto_diagnostics.html. Maybe we could have 3 sets of keywords to mix as we want.

"stats": mean, sd, hdi_3%, hdi_97%
"c_diagnostics": ess_bulk, ess_tail, and r_hat.
"mc_diagnostics": mcse_mean, ess_basic, pareto_khat, min_ss

"c_diagnostics" and ""mc_diagnostics" are note very good names

Also not sure what to include in "mc_diagnostics" I would like something as compact as possible. maybe just mcse_mean, ess_basic, min_ss?

@sethaxen
Copy link
Member

sethaxen commented Sep 3, 2024

"c_diagnostics" and ""mc_diagnostics" are note very good names

What are these two categories meant to represent?

@aloctavodia
Copy link
Contributor

aloctavodia commented Sep 3, 2024

c_diagnostics is related to convergence diagnostics and mc_diagnostics is realted to montecarlo error.
Both are related, but in a very schematic way I see the first as the diagnostic you check earlier in the workflow and the latter as the ones you check when you are closer to report results.

About the names, maybe we could use "convergence" for the first and "mc_error" or "precision" for the second

@sethaxen
Copy link
Member

sethaxen commented Sep 3, 2024

Got it, makes sense. I like the idea of dividing the diagnostics by workflow stages. But I also like the current approach where the user requests either moments or quantile-based estimates of location and spread (roughly what "focus" does).

RE "mc_diagnostics" I wonder if it makes sense to adjust the columns based on user-provided expectation(s). e.g. user requests a moment, so the moment, MCSE, and Pareto diagnostics for that moment are included. A user requests a quantile, and the quantile and MCSE are included (Pareto diagnostics make no sense here).

Basic ESS is only relevant when the mean is requested, but if MCSE of mean is already provided, I don't think it's particularly useful. In PosteriorStats, we also use an MCSE-derived heuristic to limit the printed precision of the requested estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Planned
Development

No branches or pull requests

3 participants