-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conducting more specific comparisons #751
Comments
Hey @jt-lab this is great to hear! You are right, this is currently not possible. I once opened PR #708 to return the data used to generate predictions and the arviz inference data all in a pandas dataframe. This would allow you to perform various group by and aggregations (and if you don't like pandas method chaining, I would suggest polars or tidy polars). I'm not sure why I closed the PR, but I will reopen it and rebase with main. Initially, I think returning such a data object would make the most sense without introducing a lot of new code. {marginaleffects} allows users to pass there own functions to compute differences, ratios, etc., but even that does not allow you to compute between level and hierarchical comparisons. This is interesting! While I think of the best solution, I would pull those changes (once I rebase that PR) and compute the desired comparisons yourself. Thanks for communicating with me about this! 😄 |
@GStechschulte, that sounds great!
I have no idea what (tidy) polars are... will google it. But chaining pandas methods sounds good! Many thanks! |
I just saw tidypolars and I'll drop this here https://tomicapretto.github.io/posts/2022-06-26_tidypolars/ haha |
We gave it a try with return_idata=True argument. However, the kernel dies due to memory running full. Even with ~100GB swap space it won't run through on our data. When we thin the original idata trace (from inference) to only 10 draws per chain, it runs through (eating up up to 8 GB RAM). Curiously, the returned pandas DataFrame goes up to 3600 draws in the draw column. What is the correspondence between draws in the input idata and the output idata? Do you have any suggestions how to deal with this? Many thanks! |
@jt-lab wow! The number of draws in the draw column of the dataframe depends on the number of chains and number of draws specified when you fit the model. The correspondence between draws and observed data is such that the idata is merged with the observed data that produced that draw. E.g. row 5 of the observed data corresponds to chain 3 draw 500. We realized the memory limitation (and your comment confirms this) of returning a DataFrame. Therefore, I have been working on returning an Then, since this object relies on xarray, you can use xarray for multidimensional indexing, aggregations, group by, etc. I should be able to commit some updates in the next day. I also have an example notebook on how you can use xarray to perform the cross comparisons you described above. |
You write the group would be called Concerning
After thinning the trace to 10 draws per chain, the resulting df was 3600 draws. So apparently it did not consider the current draws in the idata object (but perhaps from some stored value of the original number of draws). Curiously it was so much faster an only used <10% of the memory compared to using the not-thinned idata object. So the thinning did have some effect. Also, the original trace contained 4000 samples... so I'm really not sure where the 3600 comes from ... As always, many thanks! |
@jt-lab Ahhh yes. It is misleading. Thanks! Hmmm. Interesting. Constructing the dataframe required a couple joins so perhaps a bug is being introduced there. However, since I am pretty confident we will be going ahead with the InferenceData object, I won't pursue this (and you shouldn't worry too much about 😄) |
@jt-lab I have pushed a change to the |
@GStechschulte thanks a lot for including these changes. That's awesome! I just tried to do some comparisons with our data and realized that the inference data that is returned contains twice as many observations as the dataset. Also, when I tried to calculate specific comparisons, I noticed that the values in the DataArray are duplicated. In my code, I used average_by with interpret.comparisons and no conditional variables. Maybe something is going wrong when using average_by to return the inference data, or am I missing something? Many thanks! |
@AylinH thanks! Could you share the code and data where this is happening please? Nonetheless, this behavior can happen. If you fit a model with 100 rows, this is the observed data used to fit the model. However, when calling For example, the cartesian product of two sets |
@GStechschulte thanks for your quick response! I used the simulated data by @jt-lab. In the data set, we have two within-participants factors (factor2: X,Y and factor3: 1,2) and one between-participants factor (factor1: A, B, C). Within each level of factor1, there are 9 participants for each respective condition. This is the data set: This is the code I used (adapted from your example) import pandas as pd
import bambi as bmb
import seaborn as sns
import xarray as xr
import numpy as np
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
data['report frequency (%)'] = data['correct'] / data['count']
df, idata = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
return_idata=True
)
xr_df = xr.Dataset.from_dataframe(df)
del idata.observed_data
idata.add_groups(data=xr_df)
idata.data = idata.data.rename({"index": "p(correct, count)_obs"})
def select_draws(idata, condition_idx):
draws = idata.posterior.sel(**{"p(correct, count)_obs": condition_idx})["p(correct, count)_mean"]
draws = draws.assign_coords({"p(correct, count)_obs": np.arange(len(condition_idx))})
return draws
idx_11 = np.where((idata.data.factor2 == "Y") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
idx_41 = np.where((idata.data.factor2 == "X") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
draws_1 = select_draws(idata, idx_11)
draws_4 = select_draws(idata, idx_41)
diff = (draws_4 - draws_1).mean(("chain", "draw"))
diff The returned inference data comprises 216 observations (in the data set there are 108). This is the output of the comparison: The values in the blue rectangle are the same as the values in the orange rectangle. Also, just based on the data set, I would have expected to see 9 values in the data array. Is this related to your explanation regarding the Cartesian product? Thank you very much! |
@GStechschulte, for further testing, we have tried to manually recreate a comparison based on the returned idata which is also returned by interpret.comparisons. The mean was as expected, but the HDI boundaries were different. Is this perhaps due to how the random effect 'individual' is handled? (Note: We have removed redundant predictions, see post above.) Do you have any idea what might go wrong? Code: import pandas as pd
import bambi as bmb
import seaborn as sns
import xarray as xr
import numpy as np
import arviz as az
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
data['report frequency (%)'] = data['correct'] / data['count']
df, idata = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
return_idata=True
)
#diplay(idata)
xr_df = xr.Dataset.from_dataframe(df)
del idata.observed_data
idata.add_groups(data=xr_df)
idata.data = idata.data.rename({"index": "p(correct, count)_obs"})
def select_draws(idata, condition_idx):
draws = idata.posterior.sel(**{"p(correct, count)_obs": condition_idx})["p(correct, count)_mean"]
draws = draws.assign_coords({"p(correct, count)_obs": np.arange(len(condition_idx))})
return draws
subset1 = np.where((idata.data.factor2 == "X") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
subset2 = np.where((idata.data.factor2 == "Y") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
draws_1 = select_draws(idata, subset1)
draws_2 = select_draws(idata, subset2)
diff = (draws_2 - draws_1).sel(**{"p(correct, count)_obs": range(0,18,2)}).mean("p(correct, count)_obs") ## removing redundant predictions
az.plot_posterior(diff)
summary_df = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
)
display(summary_df) |
@AylinH thank you for the messages and I haven't ignored the first one. I have been busy as I am in between switching jobs. I will get to these messages by the end of the weekend. Thanks! 👍🏼 |
@AylinH This is because of a bug in the function that prepares the unit level data. The function was not dropping duplicate rows after creating the unit level data. If you do I will open a PR to resolve that. |
@AylinH This "discrepancy" is happening because in the When specifying So when you selected the draws diff = (draws_2 - draws_1).sel(**{"p(correct, count)_obs": range(0,18,2)}).mean("p(correct, count)_obs") you were only selecting a subset of the draws that corresponded to the following data:
and without performing a group by aggregation. So you should first compute unit level comparisons with all draws and then apply your group by aggregation. Good questions, thanks! By the way, I am adding some helper functions to make a lot of this easier. I hope to have a draft PR by the end of the weekend. |
@GStechschulte thank you very much for your explanations! We have tried to calculate the comparisons as you have explained. However, our HDI boundaries have not changed from our previous post and are still different from the This is our code (with comments for clarification): import bambi as bmb
import pandas as pd
import xarray as xr
import numpy as np
import arviz as az
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
df, idata_pred = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
return_idata=True
)
xr_df = xr.Dataset.from_dataframe(df)
del idata_pred.observed_data
idata_pred.add_groups(data=xr_df)
idata_pred.data = idata_pred.data.rename({"index": "p(correct, count)_obs"})
mask_X = (df["factor2"]=="X")
values_X = idata_pred.posterior["p(correct, count)_mean"][:,:, mask_X.values].values
mask_Y = (df["factor2"]=="Y")
values_Y = idata_pred.posterior["p(correct, count)_mean"][:,:, mask_Y.values].values
diff = values_Y - values_X # Difference on the unit-level
diff_subset = diff[:,:,((df["factor1"]=="A") & (df["factor3"]==1)).values[mask_X]] # mask_X indices in valid order for this
mean_diff = np.mean(diff_subset, axis=2) # Mean difference over units
summary_df = bmb.interpret.comparisons(
model=model,
idata=idata_pred,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
az.plot_posterior(mean_diff) We are happy about any suggestions. Many thanks! |
@GStechschulte that is awesome, many thanks for working on these modifications! I have just calculated our comparison using your new approach, which worked well. However, the HDI discrepancy is still there (the same HDIs as for the above codes). Could you please explain to us which HDIs are the correct ones? Any help would be greatly appreciated. This is the code: import warnings
import arviz as az
import numpy as np
import pandas as pd
import xarray as xr
import bambi as bmb
from bambi.interpret.helpers import data_grid, select_draws
bmb.config["INTERPRET_VERBOSE"] = False
warnings.simplefilter(action='ignore', category=FutureWarning)
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
# data grid
conditional = {
"factor1": np.array(["A"]), # Between-individuals factor
"factor3": np.array([1]),
"individual": np.arange(18, 27) # All individuals who are in factor1=A
}
variable = {"factor2": np.array(["X", "Y"])}
new_data = data_grid(model, conditional, variable)
idata_new = model.predict(idata, data=new_data, inplace=False)
# comparison
draw_1 = select_draws(idata_new, new_data, {"factor2": "X"}, "p(correct, count)_mean")
draw_2 = select_draws(idata_new, new_data, {"factor2": "Y"}, "p(correct, count)_mean")
diff = (draw_2 - draw_1).mean("p(correct, count)_obs")
summary_df = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
az.plot_posterior(diff) Data: Also, we were wondering if it is possible to specify the required contrast directly in the Thanks a lot! |
They are both returning the correct value. It is just that you are taking the mean along different dimensions and then passing that to Usually, when analyzing the posterior draws, you want to reduce the chain and draw dimensions first via ArviZ has a nice post about working the InferenceData object. Additionally, see this PyMC forum question. 👍🏼 Adapting your example: # same value as estimate column in 'summary_df'
diff = (draw_2 - draw_1).mean(("chain", "draw"))
np.mean(diff)
Computing the 94% credible interval # same values as CI columns in 'summary_df'
lower_mean = np.mean(az.hdi((draw_2 - draw_1)).sel(hdi='lower')["p(correct, count)_mean"])
higher_mean = np.mean(az.hdi((draw_2 - draw_1)).sel(hdi='higher')["p(correct, count)_mean"])
lower_mean, higher_mean
and comparing these results with the summary dataframe: summary_df = bmb.interpret.comparisons(
model=model,
idata=idata,
contrast={"factor2": ["X", "Y"]},
conditional=conditional,
average_by=['factor3', 'factor1'],
prob=0.94,
use_hdi=True
)
summary_df
|
@GStechschulte, thanks for the further explanations and all the effort. The What we calculated manually was the mean over all unit-level distributions and the mean and hdis of the resulting distribution. I think the crucial point is that we were unaware of this:
So far we have mostly worked with models where the parameter posteriors can be interpreted more directly without going through predictions. And when working with posteriors it's always "summarize last" (e.g. take differences and means first, then get mean/hdis to summarize the distribution). I must say it's not really clear to me why when working with posterior prediction draws it is to "reduce first". Our goal is to make inferences based on the comparisons (that optimally generalize to the population). I looked into some of the literature listed on the {marginal effects} page but so far this was not super helpful. Hence, if anyone has suggestion for background on this ... Edit: Just found the reference to the Gelman, Hill, Vehtari Regression book in the one of the example notebooks... will have a look at that! many thanks again for the support! |
@jt-lab yes, but this is only because we computed unit-level comparisons and passed an argument to # mean comparison
diff = (draw_2 - draw_1).mean(("chain", "draw"))
# lower and higher bound of that comparison
lower_mean = az.hdi((draw_2 - draw_1)).sel(hdi='lower')["p(correct, count)_mean"]
higher_mean = az.hdi((draw_2 - draw_1)).sel(hdi='higher')["p(correct, count)_mean"]
Since your model has categorical variables and interaction effects, the dimensionality of the posterior is quite large: idata_new["posterior"].dims
but lets imagine for a second that it is
The coordinate (draw_2 - draw_1).mean(("chain", "draw"))
# and
az.hdi((draw_2 - draw_1)) which gives us the mean and uncertainty for each unit (draw_2 - draw_1).mean(("chain", "draw")).shape, az.hdi((draw_2 - draw_1)).shape
I think it also partly depends on what type of posterior summary you are wanting to compute. But in the case of |
Yes, I think that's the crucial question. But I'd say what we want is the use case for 99% of users from our field (and probably all of behavioral sciences) would have: |
I've been trying to follow the conversation. Initially I thought we were doing something wrong in the In the meantime, I think using |
Hi all, I just came across these super helpful blog posts by Andrew Heiss. You probably know them, but just in case: [1] https://www.andrewheiss.com/blog/2022/11/29/conditional-marginal-marginaleffects/ I think our current approach is going for average population-level outcomes in the way in the "average / marginalize / integrate across existing random effects", see [1]. While reading these blog post I have been wondering again who Bambi handles the random effects in predictions. Tee R module there is the re_formula argument that allows user to specify what to do with REs. But perhaps Bambi's predict implicitely decides this based on whether new groups are predicted, is the RE factor is in the average_by, etc. But clarification would be great! |
@jt-lab yes, these blogs are fantastic. I am re-reading them today.
We have recently added docs on predicting new groups when specifying group effects in a Bambi model. This notebook only describes how this is achieved in |
@jt-lab @AylinH I have reviewed the blog posts. In the blog Marginal and Conditional Effects for GLMMs there are two quantities of interest:
In your case, I would:
|
@GStechschulte, sorry for the slow response and thanks for explanations and the notebook. |
@GStechschulte,
we keep using your interpret submodule to it's full extent :-D
We came across two types of comparisons we would like to make but are unsure if/how they are possible. So probably this is not really an issue but rather a feature request ...
There are two related (but slightly different) use cases:
We would like to conduct a kind of second-level comparison within a factor. I.e., in the example below we would now like to compare the differences of between different levels of factor 1 (see red markings) :
And in a slightly different scenario, we would like to compare the differences of e.g. factor1=A in factor3=1 with factor1=B in factor3=2. That is, the levels would be different and it would be also across the levels of one factor. This might help to clarify:
Are such comparisons possible somehow? If there was a way to get the predicted samples on which the table is based, we could perhaps calculate these differences ...
As always we are happy about any suggestions / input
Many thanks!
The text was updated successfully, but these errors were encountered: