Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouped boxplots that don't drop levels #3345

Open
ldecicco-USGS opened this issue May 30, 2019 · 11 comments
Open

Grouped boxplots that don't drop levels #3345

ldecicco-USGS opened this issue May 30, 2019 · 11 comments
Labels
feature a feature request or enhancement positions 🥇

Comments

@ldecicco-USGS
Copy link

ldecicco-USGS commented May 30, 2019

How do I (or...can I?) create a grouped boxplot in ggplot2 that does not drop the levels?

Here's an example:

df <- data.frame(x = rep(letters[1:2],6),
                 y = sample(100, size = 12),
                 type = rep(c("x","y","z"),4))

df2 <- df[!(df$x == "b" & df$type == "z"),]
df2$type <- factor(df2$type, levels = c("x","y","z"))

ggplot(data = df2) +
  geom_boxplot(aes(x=x,y=y, fill=type)) +
  scale_fill_discrete(drop=FALSE)

And what I get is:
image

What I would like is for "b" to have a 3rd empty grouping for type "z". I have tried adding an empty row like this:

df3 <- rbind(df2, 
             data.frame(x = "b",
                        y = NA,
                        type = "z"))

df3$type <- factor(df3$type, levels = c("x","y","z"))

ggplot(data = df3) +
  geom_boxplot(aes(x=x,y=y, fill=type)) +
  scale_fill_discrete(drop=FALSE)

But I get the same result.

There's a solution from 2013 involving fake data and changing limits...but I was hoping to see if anything had improved since then that I've missed:

df4 <- rbind(df2, 
             data.frame(x = "b",
                        y = 1000,
                        type = "z"))

df4$type <- factor(df4$type, levels = c("x","y","z"))

ggplot(data = df4) +
  geom_boxplot(aes(x=x,y=y, fill=type)) +
  scale_fill_discrete(drop=FALSE) +
  coord_cartesian(ylim = range(df2$y))

This is good, but seems hacky:
image

@paleolimbot
Copy link
Member

I think you're looking for position_dodge2(preserve = "single"):

library(ggplot2)
df <- data.frame(x = rep(letters[1:2],6),
                 y = sample(100, size = 12),
                 type = rep(c("x","y","z"),4))

df2 <- df[!(df$x == "b" & df$type == "z"),]
df2$type <- factor(df2$type, levels = c("x","y","z"))

ggplot(df2) +
  geom_boxplot(
    aes(x = x, y = y, fill = type), 
    position = position_dodge2(preserve = "single")
  )

Created on 2019-05-31 by the reprex package (v0.2.1)

@ldecicco-USGS
Copy link
Author

It's not perfect because you'd want the green boxplot to be lined up right at "b"...but does at least get the widths consistent. Thanks!

@paleolimbot
Copy link
Member

I see what you mean. I don't think it's possible to do that with positions at the moment, but you could use facets to get a similar result:

library(ggplot2)
df <- data.frame(x = rep(letters[1:2],6),
                 y = sample(100, size = 12),
                 type = rep(c("x","y","z"),4))

df2 <- df[!(df$x == "b" & df$type == "z"),]
df2$type <- factor(df2$type, levels = c("x","y","z"))

ggplot(df2) +
  geom_boxplot(aes(x = type, y = y, col = type)) +
  facet_wrap(vars(x))

Created on 2019-05-31 by the reprex package (v0.2.1)

@paleolimbot paleolimbot added feature a feature request or enhancement positions 🥇 labels May 31, 2019
@ldecicco-USGS
Copy link
Author

Yeah....I thought of that initially, but the actual plot I'm trying to make already uses facets. If I decide the position_dodge2 isn't good enough (...I think it probably is for my case), I might consider using patchwork or something like that to bring it all together.

@hadley
Copy link
Member

hadley commented Jun 18, 2019

I wonder if we should have drop = FALSE option to position_dodge2() to tell it to use the factor levels, rather than the actual positions? (Or maybe it's too late at that point?)

@MarkErik
Copy link

It's not perfect because you'd want the green boxplot to be lined up right at "b"...

+1 as I've encountered this issue many times (not for Boxplots, but other geoms), and as I also use facets, having a solution to keep width and position (so that if there is missing data, it won't centre the remaining items) would be excellent.

@hadley
Copy link
Member

hadley commented Jun 24, 2019

If you wonder why something so seemingly simple is so hard to fix, I'd suggest watching @karawoo's excellent rstudio::conf() talk: https://resources.rstudio.com/rstudio-conf-2019/box-plots-a-case-study-in-debugging-and-perseverance

@ldecicco-USGS
Copy link
Author

I hope creating an Issue to report non-ideal behavior like this doesn't imply I think it's a "seemingly simple fix"...because I don't think that at all!!! I've watched that talk already...it's fantastic...and my goal in creating an issue is to NOT have to do it myself!

@paleolimbot
Copy link
Member

It is a continuing problem that comes up a lot (at least one other expert ggploter has asked me about this exact issue), and we're glad to have it as an issue! Dropping levels works different in scales, facets, and positions, and we don't have workaround (of which I am aware) for this.

@ivan-paleo
Copy link

I'm having exactly the same issue: dropped levels within one facet and I don't know how to adjust the boxes' width and positions for that facet.
Is there anything new on this issue?

@teunbrand
Copy link
Collaborator

Since this came up recently in #6100 (comment), I'll put forth my two cents.

The root cause of the issue is that the group aesthetic carries no information of the original aesthetics that contributed to the grouping structure.

If we take the following plot from the top of this issue;

The internals can only know 'I have two positions, 3 groups at position 'a' and 2 groups at position 'b'. It does not know that groups 1 and 4 share "type = 'x'", or that groups 2 and 5 share "type = 'y'". What has contributed to these groups cannot be reconstructed from the information available to the position adjustment.

I have discussed this with Thomas a few times and we both haven't been able to come up with great solutions. Until one emerges, I don't think this will be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement positions 🥇
Projects
None yet
Development

No branches or pull requests

6 participants