Bug fix for plotting order in plot_comparison (fixes mismatch between labels and plot) #731

jt-lab · 2023-09-28T11:27:25Z

A call like ...

bmb.interpret.plot_comparisons(
        model=model, 
        idata=trace, 
        contrast={"factor1": ["X", "Y"]},
        conditional={"factor2": ["C", "A", "B"], "factor3": [1, 2]},
)

currently leads to a mismatch of the order with which the levels of factor2 are plotted on the x axis and the x axis labels!

I have traced this down to .values.dtype.categories returning the names in sorted order which is eventually used for the labels. The actual plot is based on the user provided order.

This commit is only a quick fix which sorts the user-provided levels to match the order to avoid any false results. It would be better to give the user control about the order by using the order of the user provided list for the labels but a change like that seems to interact with further code which I don't fully understand.

Moreover, I'm not entirely sure what the benefit of the call above is compared to:

bmb.interpret.plot_comparisons(
        model=model, 
        idata=trace, 
        contrast={"factor1": ["X", "Y"]},
        conditional=["factor2", "factor3"],
)

I thought it was to give the user control over which levels are plotted (and in which order). But removing levels from the list does lead to errors (and the order is not applied). I might miss something, but if there is no additional functionality comapred to the list version, perhaps the dict version should be removed (for now).

I'm sorry for the many incremental commits just for fixing spaces and typos. I could not work with a local version but make change in githubs web interface (and test the code on a server).

Perhaps the rationale behind centering could be added to help user decide whether to turn it on or not. But I'm not entirely sure about this.

… character, or so. Also deleted superflous space unrelated to the black's complaint

... for more informative output.

Cannot make changes locally at the moment. Hence so many commits ...

It's a bit annoying, I can only work from the web interface currently, but hopefully all typos are resolved now

tomicapretto · 2023-09-28T16:19:35Z

@jt-lab thanks for the contribution! Can I ask for an example of how this looked before and how it looks after the change? Thanks!

Bonus: If it comes with data so I can test on my end, even better!

jt-lab · 2023-09-28T17:32:15Z

@jt-lab thanks for the contribution! Can I ask for an example of how this looked before and how it looks after the change? Thanks!

Bonus: If it comes with data so I can test on my end, even better!

Here you go:
Code (note the two different orders of the levels in the conditional argument of plot_comparisons):

import pandas as pd
import bambi as bmb
from matplotlib.pylab import plt

df = pd.read_csv('simulated_data_order_prob.csv')

model = bmb.Model('p(correct, count) ~ factor1*factor2*factor3 + (factor3+factor2|individual)',
                  df, family='binomial')
trace = model.fit(tune=500, draws=1000, target_accept=0.7, init='adapt_diag')

f, ax = plt.subplots(2)
bmb.interpret.plot_comparisons(
    model=model,
    idata=trace,
    contrast={"factor2": ["X", "Y"]},
    conditional={"factor1": ["A", "B", "C"], "factor3": [1, 2]},
    ax=ax[0]
) 
bmb.interpret.plot_comparisons(
    model=model,
    idata=trace,
    contrast={"factor2": ["X", "Y"]},
    conditional={"factor1": ["C", "B", "A"], "factor3": [1, 2]},
    ax=ax[1]
)

Plot before the fix:
(e.g.: git+https://github.com/jt-lab/bambi.git@a072c84809fe62e6efe1d7818f3d5d28d81e00ed)

After the fix:
(e.g.: git+https://github.com/jt-lab/bambi.git)

Data set:
simulated_data_order_prob.csv
This is the same simulated data I also posted in question on the pymc discourse. I just added a further level for factor1 and matched the factor and level names with my original description of the problem.

GStechschulte · 2023-09-28T19:01:52Z

@jt-lab thanks for finding the subtle bug and opening a PR (good attention to detail)👍🏼. The difference between

# user provided values only (no default values are computed for "factor1" and "factor3")
bmb.interpret.plot_comparisons(
        model=model, 
        idata=trace, 
        contrast={"factor2": ["X", "Y"]},
        conditional={"factor1": ["C", "A", "B"], "factor3": [1, 2]},
)

and

# computes default values for "factor1" and "factor3"
bmb.interpret.plot_comparisons(
        model=model, 
        idata=trace, 
        contrast={"factor2": ["X", "Y"]},
        conditional=["factor1", "factor3"],
)

is that in the latter code snippet, since no values are passed for factor1 and factor3, default values are computed for these covariates. In the Default contrast and conditional values section of the plot_comparisons docs, these details are outlined.

Thanks!

Also, you are right. This is because in def _plot_categoric(...), to determine the x-axis values we call get_unique_levels(plot_data[main]) which in turn uses np.unique to return the unique elements. By default np.unique also sorts this array. Then, once ax.setxticklabels(main_levels) is called, the bug is introduced.

jt-lab · 2023-09-28T20:29:23Z

@GStechschulte, thanks for the explantions! I'm not sure I understand the purpose of the dictionary conditional argument. When I list all levels of the factors, I get the same as with the list version defaults. When I list a subset of the levels, I get errors.

Unrelated: When reading through the plot_comparisons doc, I noticed this sentence "Bambi uses the ordering (keys if dict and elements if list)...". Im not sure I understand correctly, but dictionaries don't gurantee any order of the items or keys. So this might also be a source of unexpected behavior. So the user might need to pass an OrderedDict to be safe. But maybe i'm missing something!

Many thanks again!

GStechschulte · 2023-09-30T09:01:45Z

I'm not sure I understand the purpose of the dictionary conditional argument. When I list all levels of the factors, I get the same as with the list version defaults

This is due to the data types of the data being used to fit the model.

When you pass the list ["factor1", "factor3"] for the conditional arg., factor1 is assigned as the "main" covariate and factor3 is assigned the "group" covariate. Then, since you don't specify any values to be associated with the factor covariates, a default value is computed for them. Default values for "main", "group", and "panel" are the following pseudocode:

if v == "main":
    
    if v == numeric:
        return np.linspace(v.min(), v.max(), 50)
    elif v == categorical:
        return np.unique(v)

elif v == "group":
    
    if v == numeric:
        return np.quantile(v, np.linspace(0, 1, 5))
    elif v == categorical:
        return np.unique(v)

elif v == "panel":
    
    if v == numeric:
        return np.quantile(v, np.linspace(0, 1, 5))
    elif v == categorical:
        return np.unique(v)

In your case, "main" and "group" are both categorical data types. Thus, the default values are the unique values.

When you pass a dict to the conditional arg., the keys of the dict are taken as the "main", "group", and "panel" covariates. Thus, {"factor1": ["A", "B", "C"], "factor3": [1, 2]} corresponds to {"main": "factor1", "group": "factor3"}. Since you passed values for the covariates, no default values are computed.

Then, since you ended up passing all the unique values for factor1, and factor3

df["factor1"].unique(), df["factor3"].unique()

(array(['A', 'B', 'C'], dtype=object), array([1, 2]))

as the dict values, the result of passing a list to conditional and the passing of the dict (stated above) results in the same data being generated. If you were to pass {"factor1": ["A", "B"], "factor3": [1]}, you would get a different result between the two.

Lastly, regarding

When I list a subset of the levels, I get errors.

This is due to a bug in the function def get_unique_levels(...) of utils.py. The code in the if block wasn't returning the unique levels, it was returning all levels. Thus, the reason for the shape error. Once the function returns unique levels

def get_unique_levels(x: np.ndarray) -> np.ndarray:
    return np.unique(x)

there are no more shape errors. As an aside, if factor3 should be a categorical data type, you should explicitly convert the data type (pd.Categorical(data["factor3"]), ordered=True) as such. Otherwise, Bambi and the interpret sub-package functions will treat it as numeric since it takes the values of [1, 2].

I hope this helps 👍🏼

GStechschulte · 2023-09-30T09:54:04Z

Should I make a separate PR incorporating @jt-lab commits and my local changes to fix the bugs? Or should @jt-lab keep this PR and then commit my changes to fix the bugs? @tomicapretto

tomicapretto · 2023-10-01T12:15:06Z

@GStechschulte if your fixes are related to what @jt-lab is doing here, I think @jt-lab could allow this PR to be edited by maintainers and then you could contribute here so everything is in place and we respect the original authorship. But if they are separate things, you could open a different PR.

But if you think that's too much, feel free to do something else.

jt-lab · 2023-10-01T12:28:23Z

@tomicapretto, how can I make it editable?

tomicapretto · 2023-10-01T16:06:12Z

I think it's already enabled

GStechschulte · 2023-10-03T04:38:46Z

I have pushed 3 commits to your branch. Now, the plots work with no errors. I also decided that your original commit to sort the dict values in ascending order is the most appropriate. In particular, the x-axis of graphs should be in ascending order $1, 2, 3,...,n$ or $A, B, C,...,n$.

Thanks a lot for raising the issue and for opening a PR 😄 It is much appreciated!

Below are a few of your examples:

f, ax = plt.subplots(2)
bmb.interpret.plot_comparisons(
    model=model,
    idata=trace,
    contrast={"factor2": ["X", "Y"]},
    conditional={"factor1": ["A", "B", "C"], "factor3": [1, 2]},
    ax=ax[0]
) 
bmb.interpret.plot_comparisons(
    model=model,
    idata=trace,
    contrast={"factor2": ["X", "Y"]},
    conditional={"factor1": ["C", "B", "A"], "factor3": [1, 2]},
    ax=ax[1]
)
plt.tight_layout();

bmb.interpret.plot_comparisons(
    model=model,
    idata=trace,
    contrast={"factor2": ["X", "Y"]},
    conditional={"factor1": ["A", "C"], "factor3": [1, 2]},
    fig_kwargs=dict(figsize=(7, 3))
);

bmb.interpret.plot_comparisons(
    model=model,
    idata=trace,
    contrast={"factor2": ["X", "Y"]},
    conditional=["factor1", "factor3"],
    fig_kwargs=dict(figsize=(7, 3))
);

review-notebook-app · 2023-10-03T11:37:24Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tomicapretto · 2023-10-03T11:40:41Z

@GStechschulte @jt-lab ok I struggled a little bit with git and the fact that we're working on @jt-lab's main branch (I used a different name locally). Nevertheless, it seems everything is working. I just made a small modification to the if-else structure, I think it's clearer now, and I also wrote a fix for the HSGP and the failing tests. We can merge once CI passes.

GStechschulte · 2023-10-03T11:57:34Z

@tomicapretto Ahh great! And thanks for the HSGP fix 👍🏼

GStechschulte · 2023-10-03T12:30:46Z

Mmmm. The test_plots.py passed locally before I pushed those three commits (Tomas's commits shouldn't cause them to fail). I will look into this evening.

tomicapretto · 2023-10-03T12:38:29Z

Mmmm. The test_plots.py passed locally before I pushed those three commits (Tomas's commits shouldn't cause them to fail). I will look into this evening.

The problem happens when we test this

plot_slopes(model, idata, wrt={"Days": 2}, conditional={"Subject": 308}

and it's because it's calling sorted(308), which cannot be sorted because it's not iterable. I'm fixing it now :)

codecov-commenter · 2023-10-03T13:39:02Z

Codecov Report

Merging #731 (8d7e161) into main (e53f8da) will decrease coverage by 0.01%.
Report is 4 commits behind head on main.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
- Coverage   89.56%   89.55%   -0.01%     
==========================================
  Files          44       44              
  Lines        3525     3524       -1     
==========================================
- Hits         3157     3156       -1     
  Misses        368      368

Files	Coverage Δ
bambi/backend/terms.py	`96.65% <100.00%> (ø)`
bambi/interpret/create_data.py	`100.00% <100.00%> (ø)`
bambi/interpret/utils.py	`87.37% <100.00%> (-0.32%)`	⬇️
bambi/models.py	`79.22% <ø> (ø)`
bambi/priors/scaler.py	`96.70% <100.00%> (ø)`
bambi/interpret/plot_types.py	`73.39% <75.00%> (ø)`
bambi/interpret/plotting.py	`85.40% <83.33%> (+0.43%)`	⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

… labels and plot) (bambinos#731) Co-authored-by: GStechschulte <[email protected]> Co-authored-by: Tomas Capretto <[email protected]>

jt-lab added 13 commits September 21, 2023 13:51

Added missing doc string part for center_predictors

e280507

Perhaps the rationale behind centering could be added to help user decide whether to turn it on or not. But I'm not entirely sure about this.

Clarified that centering affects prior interpretation in models.py

d40a1d8

Merge branch 'bambinos:main' into main

ac7161f

Applied black to make CI happy. Problem must have been some invisible…

d6a6b7c

… character, or so. Also deleted superflous space unrelated to the black's complaint

Added --diff and --color to black in test.yml

a072c84

... for more informative output.

Fix for plotting order error

b79673a

Still fixing plot order

314a4dc

Another typo

4217a49

Cannot make changes locally at the moment. Hence so many commits ...

Merge branch 'bambinos:main' into main

bf93c24

Black found some spaces ...

137464b

Arrg, more typos

b62cc89

It's a bit annoying, I can only work from the web interface currently, but hopefully all typos are resolved now

Making pylint happy

da5085e

Merge branch 'bambinos:main' into main

84cdc0c

GStechschulte added 3 commits October 3, 2023 06:26

add .drop_duplicates() before returning dataframe

7bdd001

use np.unique instead of get_unique_levels func (and remove)

7ca9762

sort values if conditional dict in plot_comparisons and plot_slopes

cdd40df

GStechschulte added the bug label Oct 3, 2023

GStechschulte requested a review from tomicapretto October 3, 2023 09:53

tomicapretto added 2 commits October 3, 2023 08:05

Merge branch 'main' of github.com:jt-lab/bambi

1387d44

Add clarity to if-else and PyTensor bugfix

9272a24

Merge branch 'main' of github.com:jt-lab/bambi into jt-main

b074ba1

tomicapretto approved these changes Oct 3, 2023

View reviewed changes

Listify conditional values so they can be sorted

8d7e161

tomicapretto force-pushed the main branch from b074ba1 to 1387d44 Compare October 3, 2023 12:44

tomicapretto merged commit 0e4d4fc into bambinos:main Oct 3, 2023
4 checks passed

GStechschulte mentioned this pull request Oct 3, 2023

hsgp tests failing due to incompatible Elemwise input shapes #730

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix for plotting order in plot_comparison (fixes mismatch between labels and plot) #731

Bug fix for plotting order in plot_comparison (fixes mismatch between labels and plot) #731

jt-lab commented Sep 28, 2023

tomicapretto commented Sep 28, 2023

jt-lab commented Sep 28, 2023 •

edited

Loading

GStechschulte commented Sep 28, 2023 •

edited

Loading

jt-lab commented Sep 28, 2023

GStechschulte commented Sep 30, 2023 •

edited

Loading

GStechschulte commented Sep 30, 2023

tomicapretto commented Oct 1, 2023

jt-lab commented Oct 1, 2023

tomicapretto commented Oct 1, 2023

GStechschulte commented Oct 3, 2023

review-notebook-app bot commented Oct 3, 2023

tomicapretto commented Oct 3, 2023

GStechschulte commented Oct 3, 2023

GStechschulte commented Oct 3, 2023

tomicapretto commented Oct 3, 2023

codecov-commenter commented Oct 3, 2023

Bug fix for plotting order in plot_comparison (fixes mismatch between labels and plot) #731

Bug fix for plotting order in plot_comparison (fixes mismatch between labels and plot) #731

Conversation

jt-lab commented Sep 28, 2023

tomicapretto commented Sep 28, 2023

jt-lab commented Sep 28, 2023 • edited Loading

GStechschulte commented Sep 28, 2023 • edited Loading

jt-lab commented Sep 28, 2023

GStechschulte commented Sep 30, 2023 • edited Loading

GStechschulte commented Sep 30, 2023

tomicapretto commented Oct 1, 2023

jt-lab commented Oct 1, 2023

tomicapretto commented Oct 1, 2023

GStechschulte commented Oct 3, 2023

review-notebook-app bot commented Oct 3, 2023

tomicapretto commented Oct 3, 2023

GStechschulte commented Oct 3, 2023

GStechschulte commented Oct 3, 2023

tomicapretto commented Oct 3, 2023

codecov-commenter commented Oct 3, 2023

Codecov Report

jt-lab commented Sep 28, 2023 •

edited

Loading

GStechschulte commented Sep 28, 2023 •

edited

Loading

GStechschulte commented Sep 30, 2023 •

edited

Loading