Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different order of parameters in MCMC chain vs. MAP result #1322

Closed
ElOceanografo opened this issue Jun 10, 2020 · 4 comments
Closed

Different order of parameters in MCMC chain vs. MAP result #1322

ElOceanografo opened this issue Jun 10, 2020 · 4 comments

Comments

@ElOceanografo
Copy link
Contributor

First off, the new MAP functionality is great--something I've been waiting for a while, thanks!

I was playing around with the example in the docs and noticed that the MAP result places the variables in a different order than what is returned in the MCMC chain:

using Turing, Optim
@model function gdemo(x)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))

    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s))
    end
end

model = gdemo(randn(10))
map_estimate = optimize(model, MAP())
names(coef(map_estimate))[1] # s, m
chain1 = sample(model, NUTS(), 1000)
names(DataFrame(chain)) # m, s

It looks like they are alphabetized in the chain, but printed in order of definition in the model in the MAP result. This caused me some confusion till I noticed it; I think it would probably be better to have them in the same order...

@harryscholes
Copy link
Contributor

harryscholes commented Jun 29, 2020

This doesn't solve your problem, but the behaviour of DataFrame(chain) has changed in MCMCChains v4.0.0. TuringLang/MCMCChains.jl#193 implemented the Tables interface. At the same time, the dataframes-compat.jl code was removed, which allowed the user to select which sections of parameters to use as columns in a DataFrame.

On Turing master branch the behaviour is now:

using Turing, Optim, StatsBase, DataFrames

@model function gdemo(x)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))

    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s))
    end
end

model = gdemo(randn(10))
map_estimate = optimize(model, MAP())
names(coef(map_estimate))[1] # `s, m`
chain = sample(model, NUTS(), 100)
names(DataFrame(chain)) # NB not `m, s` anymore
julia> names(DataFrame(chain)) # NB not `m, s` anymore
16-element Array{String,1}:
 "iteration"
 "chain"
 "acceptance_rate"
 "hamiltonian_energy"
 "hamiltonian_energy_error"
 ⋮
 "nom_step_size"
 "numerical_error"
 "s"
 "step_size"
 "tree_depth"

@devmotion
Copy link
Member

Part of the redesign of MCMCChains tries to remove the amount of "automagic" that was present in MCMCChains, which could lead to surprising behaviour and type instabilities. Instead users have to subset the chains now in some cases explicitly. Usually that's possible without much additional code and IMO highlights more clearly what you want to achieve. Moreover, it allows (and slightly enforces) to write more efficient code since now subsets are not computed anymore everytime you calculate something but instead you can/should subset the chains once and then perform your calculations without any additional modification of the underlying arrays.

In this case you might want to only extract the parameters by writing

chain = sample(model, NUTS(), 100)
chains_only_params = Chains(chain, :parameters)
names(DataFrame(chains_only_params))

@devmotion
Copy link
Member

I think the main issue here (different order of parameters in MCMC chain vs MAP result) could be fixed by using natural sort order for MAP results as well (which could be disabled optionally by setting sorted = false, analogously to the behaviour of sample).

@Red-Portal
Copy link
Member

I'll close this since this seems to have been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants