Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log10 transformation gives the wrong scales with geom_bar #4751

Closed
nicolerg opened this issue Mar 4, 2022 · 6 comments
Closed

log10 transformation gives the wrong scales with geom_bar #4751

nicolerg opened this issue Mar 4, 2022 · 6 comments

Comments

@nicolerg
Copy link

nicolerg commented Mar 4, 2022

Hello,

When there is more than one entry per x value for geom_bar, scale_y_continuous(trans="log10") (or equivalently scale_y_log10()) labels the y-axis incorrectly. In this example, the max value should be 5e3, but the axis label goes up to 1e7.

Expected behavior with simple data:

df = data.frame(x=1:5, y=c(2, 20, 200, 400, 5000))

ggplot(df, aes(x=factor(x), y=y)) +
  geom_bar(stat="identity") +
  theme_classic() +
  scale_y_continuous(trans="log10") 

simple

Bug with more complex data:

df = data.frame(x=rep(1:5, 2),
                y=rep(c(1, 10, 100, 200, 2500),2), 
                group=rep(letters[1:5], each=2))

ggplot(df, aes(x=factor(x), y=y)) +
  geom_bar(stat="identity") +
  theme_classic() +
  scale_y_continuous(trans="log10") 

log10

It is also a bit odd that the log10 transformation seems to be applied per entry, i.e. there is a total value of 2 for x==1, but it shows up as 0 because log10(1) + log10(1) = 0, whereas I would expect it to show log10(1+1).

This is with ggplot2_3.3.5.

@yutannihilation
Copy link
Member

yutannihilation commented Mar 5, 2022

Thanks for reporting. I also confirmed with the dev version. The labels match if I sqrt() them, but I'm not sure what's happening here.

library(ggplot2)

df <- data.frame(
  x = rep(1:5, 2),
  y = rep(c(1, 10, 100, 200, 2500), 2),
  group = rep(letters[1:5], each = 2)
)

b <-  10 ^ (1:7)

ggplot(df, aes(x = factor(x), y = y)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = b, colour = alpha("red", 0.5)) +
  theme_classic() +
  scale_y_continuous(trans = "log10", breaks = b, labels = scales::label_comma())

label_tweak <- function(x) {
  x <- sqrt(x)
  scales::label_comma()(x)
}

ggplot(df, aes(x = factor(x), y = y)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = b, colour = alpha("red", 0.5)) +
  theme_classic() +
  scale_y_continuous(trans = "log10", breaks = b, labels = label_tweak)

Created on 2022-03-05 by the reprex package (v2.0.1)

@teunbrand
Copy link
Collaborator

teunbrand commented Mar 5, 2022

I think what is going on is the same as in #4731, in that the stacking is applied after log-transforming the values, giving effective heights of log(a) + log(b) == log(a * b) instead of log(a + b).

library(ggplot2)

df <- data.frame(
  x = rep(1:5, 2),
  y = rep(c(1, 10, 100, 200, 2500), 2),
  group = rep(letters[1:5], each = 2)
)

b <-  10 ^ (1:7)

p <- 
ggplot(df, aes(x = factor(x), y = y, fill = factor(seq_along(x)))) +
  scale_y_continuous(trans = "log10")

p + geom_bar(stat = "identity", position = "stack")

p + geom_bar(stat = "identity", position = "dodge")

Created on 2022-03-05 by the reprex package (v2.0.1)

@yutannihilation
Copy link
Member

Ah, thanks for the explanation. Sorry I didn't (and don't) have enough time to read and understand #4731. Closing this as this seems a duplicate of #4731.

@fwaegena
Copy link

How to solve it if you want to have a stack barplot with an y scale log transformed?

@teunbrand
Copy link
Collaborator

We discourage stacking a plot with non-linear transformation of the scale.
From ?position_stack:

Because stacking is performed after scale transformations, stacking with non-linear scales gives distortions that easily lead to misinterpretations of the data. It is therefore discouraged to use these position adjustments in combination with scale transformations, such as logarithmic or square root scales.

I recommend dodging instead.

@clauswilke
Copy link
Member

@fwaegena In most cases, bar plots with a y log scale are a bad idea anyways (even though they are widely used in microbiology), because the bar length can never accurately reflect the data values. The bars are infinitely long. This is discussed here: https://clauswilke.com/dataviz/proportional-ink.html#visualizations-along-logarithmic-axes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants