forked from clauswilke/dataviz
-
Notifications
You must be signed in to change notification settings - Fork 0
/
redundant_coding.Rmd
566 lines (486 loc) · 26.9 KB
/
redundant_coding.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
```{r echo = FALSE, message = FALSE}
# run setup script
source("_common.R")
library(lubridate)
library(ggridges)
```
# Redundant coding {#redundant-coding}
In Chapter \@ref(color-pitfalls), we have seen that color cannot always convey information as effectively as we might wish. If we have many different items we want to identify, doing so by color may not work. It will be difficult to match the colors in the plot to the colors in the legend (Figure \@ref(fig:popgrowth-vs-popsize-colored)). And even if we only need to distinguish two to three different items, color may fail if the colored items are very small (Figure \@ref(fig:colors-thin-lines)) and/or the colors look similar for people suffering from color-vision deficiency (Figures \@ref(fig:red-green-cvd-sim) and \@ref(fig:blue-green-cvd-sim)). The general solution in all these scenarios is to use color to enhance the visual appearance of the figure without relying entirely on color to convey key information. I refer to this design principle as *redundant coding*, because it prompts us to encode data redundantly, using multiple different aesthetic dimensions.
## Designing legends with redundant coding
Scatter plots of several groups of data are frequently designed such that the points representing different groups differ only in their color. As an example, consider Figure \@ref(fig:iris-scatter-one-shape), which shows the sepal width versus the sepal length of three different *Iris* species. (Sepals are the outer leafs of flowers in flowering plants.) The points representing the different species differ in their colors, but otherwise all points look exactly the same. Even though this figure contains only three distinct groups of points, it is difficult to read even for people with normal color vision. The problem arises because the data points for the two species *Iris virginica* and *Iris versicolor* intermingle, and their two respective colors, green and blue, are not particularly distinct from each other.
(ref:iris-scatter-one-shape) Sepal width versus sepal length for three different iris species (*Iris setosa*, *Iris virginica*, and *Iris versicolor*). Each point represents the measurements for one plant sample. A small amount of jitter has been applied to all point positions to prevent overplotting. The figure is labeled "bad" because the *virginica* points in green and the *versicolor* points in blue are difficult to distinguish from each other.
```{r iris-scatter-one-shape, fig.cap = '(ref:iris-scatter-one-shape)'}
breaks = c("setosa", "virginica", "versicolor")
labels = paste0("Iris ", breaks)
iris_scatter_base <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, fill = Species, color = Species)) +
scale_color_manual(
values = darken(c("#E69F00", "#56B4E9", "#009E73"), 0.3),
breaks = breaks,
labels = labels,
name = NULL
) +
scale_fill_manual(
values = c("#E69F0080", "#56B4E980", "#009E7380"),
breaks = breaks,
labels = labels,
name = NULL
) +
scale_x_continuous(
limits = c(3.95, 8.2), expand = c(0, 0),
labels = c("4.0", "5.0", "6.0", "7.0", "8.0"),
name = "sepal length"
) +
scale_y_continuous(
limits = c(1.9, 4.6), expand = c(0, 0),
name = "sepal width"
)
iris_scatter <- iris_scatter_base +
geom_point(
size=2.5, shape=21, stroke = 0.5,
position = position_jitter(
width = 0.01 * diff(range(iris$Sepal.Length)),
height = 0.01 * diff(range(iris$Sepal.Width)),
seed = 3942
)
) +
theme_dviz_grid() +
theme(
legend.title.align = 0.5,
legend.text = element_text(face = "italic"),
legend.spacing.y = unit(3.5, "pt"),
plot.margin = margin(7, 7, 3, 1.5)
)
stamp_bad(iris_scatter)
```
Surprisingly, the green and blue points look more distinct for people with red--green color-vision-deficiency (deuteranomaly or protanomaly) than for people with normal color vision (compare Figure \@ref(fig:iris-scatter-one-shape-cvd), top row, to Figure \@ref(fig:iris-scatter-one-shape)). On the other hand, for people with blue--yellow deficiency (tritanomaly) the blue and green points look very similar (Figure \@ref(fig:iris-scatter-one-shape-cvd), bottom left). And if we print out the figure in gray-scale (i.e., we *desaturate* the figure), we cannot distinguish any of the iris species (Figure \@ref(fig:iris-scatter-one-shape-cvd), bottom right).
(ref:iris-scatter-one-shape-cvd) Color-vision-deficiency simulation of Figure \@ref(fig:iris-scatter-one-shape).
```{r iris-scatter-one-shape-cvd, fig.width = 5.5*6/4.2, fig.asp = 0.66, fig.cap = '(ref:iris-scatter-one-shape-cvd)'}
iris_scatter_small <- iris_scatter_base +
geom_point(
size=.655*2.5, shape=21, stroke = .655*0.5,
position = position_jitter(
width = 0.01 * diff(range(iris$Sepal.Length)),
height = 0.01 * diff(range(iris$Sepal.Width)),
seed = 3942
)
) +
theme_dviz_grid(
.655*14,
line_size = .85*.5 # make line size a little bigger than mathematically correct
) +
theme(
legend.title.align = 0.5,
legend.text = element_text(face = "italic"),
legend.spacing.y = grid::unit(.655*3, "pt"),
plot.margin = margin(18, 1, 1, 1)
)
cvd_sim2(iris_scatter_small, label_size = 14, label_y = .98, scale = .95)
```
There are two simple improvements we can make to Figure \@ref(fig:iris-scatter-one-shape) to alleviate these issues. First, we can swap the colors used for *Iris setosa* and *Iris versicolor*, so that the blue is no longer directly next to the green (Figure \@ref(fig:iris-scatter-three-shapes)). Second, we can use three different symbol shapes, so that the points all look different. With these two changes, both the original version of the figure (Figure \@ref(fig:iris-scatter-three-shapes)) and the versions under color-vision-deficiency and in grayscale (Figure \@ref(fig:iris-scatter-three-shapes-cvd)) become legible.
(ref:iris-scatter-three-shapes) Sepal width versus sepal length for three different iris species. Compared to Figure \@ref(fig:iris-scatter-one-shape), we have swapped the colors for *Iris setosa* and *Iris versicolor* and we have given each iris species its own point shape.
```{r iris-scatter-three-shapes, fig.cap = '(ref:iris-scatter-three-shapes)'}
iris_scatter2_base <- ggplot(
iris, aes(x = Sepal.Length, y = Sepal.Width, shape = Species, fill = Species, color = Species)
) +
scale_shape_manual(
values = c(21, 22, 23),
breaks = breaks,
labels = labels,
name = NULL
) +
scale_color_manual(
values = darken(c("#56B4E9", "#E69F00", "#009E73"), 0.3),
breaks = breaks,
labels = labels,
name = NULL
) +
scale_fill_manual(
values = c("#56B4E980", "#E69F0080", "#009E7380"),
breaks = breaks,
labels = labels,
name = NULL
) +
scale_x_continuous(
limits = c(3.95, 8.2), expand = c(0, 0),
labels = c("4.0", "5.0", "6.0", "7.0", "8.0"),
name = "sepal length"
) +
scale_y_continuous(
limits = c(1.9, 4.6), expand = c(0, 0),
name = "sepal width"
)
iris_scatter2 <- iris_scatter2_base +
geom_point(
size=2.5, stroke = 0.5,
position = position_jitter(
width = 0.01 * diff(range(iris$Sepal.Length)),
height = 0.01 * diff(range(iris$Sepal.Width)),
seed = 3942)
) +
theme_dviz_grid() +
theme(
legend.title.align = 0.5,
legend.text = element_text(face = "italic"),
legend.spacing.y = unit(3.5, "pt"),
plot.margin = margin(7, 7, 3, 1.5)
)
iris_scatter2
```
(ref:iris-scatter-three-shapes-cvd) Color-vision-deficiency simulation of Figure \@ref(fig:iris-scatter-three-shapes). Because of the use of different point shapes, even the fully desaturated gray-scale version of the figure is legible.
```{r iris-scatter-three-shapes-cvd, fig.width = 5.5*6/4.2, fig.asp = 0.66, fig.cap = '(ref:iris-scatter-three-shapes-cvd)'}
iris_scatter2_small <- iris_scatter2_base +
geom_point(
size=.655*2.5, stroke = .655*0.5,
position = position_jitter(
width = 0.01 * diff(range(iris$Sepal.Length)),
height = 0.01 * diff(range(iris$Sepal.Width)),
seed = 3942)
) +
theme_dviz_grid(
.655*14,
line_size = .85*.5 # make line size a little bigger than mathematically correct
) +
theme(
legend.title.align = 0.5,
legend.text = element_text(face = "italic"),
legend.spacing.y = grid::unit(.655*3, "pt"),
plot.margin = margin(18, 1, 1, 1)
)
cvd_sim2(iris_scatter2_small, label_size = 14, label_y = .98, scale = .95)
```
Changing the point shape is a simple strategy for scatter plots but it doesn't necessarily work for other types of plots. In line plots, we could change the line type (solid, dashed, dotted, etc., see also Figure \@ref(fig:common-aesthetics)), but using dashed or dotted lines often yields sub-optimal results. In particular, dashed or dotted lines usually don't look good unless they are perfectly straight or only gently curved, and in either case they create visual noise. Also, it frequently requires significant mental effort to match different types of dash or dot--dash patterns from the plot to the legend. So what do we do with a visualization such as Figure \@ref(fig:tech-stocks-bad-legend), which uses lines to show the change in stock price over time for four different major tech companies?
(ref:tech-stocks-bad-legend) Stock price over time for four major tech companies. The stock price for each company has been normalized to equal 100 in June 2012. This figure is labeled as "bad" because it takes considerable mental energy to match the company names in the legend to the data curves. Data source: Yahoo Finance
```{r tech-stocks-bad-legend, fig.cap = '(ref:tech-stocks-bad-legend)'}
price_plot_base <- ggplot(tech_stocks, aes(x = date, y = price_indexed, color = ticker)) +
geom_line(size = 0.66, na.rm = TRUE) +
scale_color_manual(
values = c("#000000", "#E69F00", "#56B4E9", "#009E73"),
name = "",
breaks = c("GOOG", "AAPL", "FB", "MSFT"),
labels = c("Alphabet", "Apple", "Facebook", "Microsoft")
) +
scale_x_date(
name = "year",
limits = c(ymd("2012-06-01"), ymd("2017-05-31")),
expand = c(0,0)
) +
scale_y_continuous(
name = "stock price, indexed",
limits = c(0, 560),
expand = c(0,0)
)
stamp_bad(
price_plot_base +
theme_dviz_hgrid() +
theme(plot.margin = margin(3, 7, 3, 1.5))
)
```
The figure contains four lines representing the stock prices of the four different companies. The lines are color coded using a colorblind-friendly color scale. So it should be relatively straightforward to associate each line with the corresponding company. Yet it is not. The problem here is that the data lines have a clear visual order. The yellow line, representing Facebook, is clearly the highest line, and the black line, representing Apple, is clearly the lowest, with Alphabet and Microsoft in between, in that order. Yet the order of the four companies in the legend is Alphabet, Apple, Facebook, Microsoft (alphabetic order). Thus, the perceived order of the data lines differs from the order of the companies in the legend, and it takes a surprising amount of mental effort to match data lines with company names.
This problem arises commonly with plotting software that autogenerates legends. The plotting software has no concept of the visual order the viewer will perceive. Instead, the software sorts the legend by some other order, most commonly alphabetical. We can fix this problem by manually reordering the entries in the legend so they match the perceived ordering in the data (Figure \@ref(fig:tech-stocks-good-legend)). The result is a figure that makes it much easier to match the legend to the data.
(ref:tech-stocks-good-legend) Stock price over time for four major tech companies. The stock price for each company has been normalized to equal 100 in June 2012. Data source: Yahoo Finance
```{r tech-stocks-good-legend, fig.cap = '(ref:tech-stocks-good-legend)'}
price_plot_base_good <- ggplot(tech_stocks, aes(x = date, y = price_indexed, color = ticker)) +
scale_color_manual(
values = c("#000000", "#E69F00", "#56B4E9", "#009E73"),
name = "",
breaks = c("FB", "GOOG", "MSFT", "AAPL"),
labels = c("Facebook", "Alphabet", "Microsoft", "Apple")
) +
scale_x_date(
name = "year",
limits = c(ymd("2012-06-01"), ymd("2017-05-31")),
expand = c(0,0)
) +
scale_y_continuous(
name = "stock price, indexed",
limits = c(0, 560),
expand = c(0,0)
)
price_plot_base_good +
geom_line(size = 0.66, na.rm = TRUE) +
theme_dviz_hgrid() +
theme(plot.margin = margin(3, 7, 3, 1.5))
```
```{block type='rmdtip', echo=TRUE}
If there is a clear visual ordering in your data, make sure to match it in the legend.
```
Matching the legend order to the data order is always helpful, but the benefits are particularly obvious under color-vision deficiency simulation (Figure \@ref(fig:tech-stocks-good-legend-cvd)). For example, it helps in the tritanomaly version of the figure, where the blue and the green become difficult to distinguish (Figure \@ref(fig:tech-stocks-good-legend-cvd), bottom left). It also helps in the grayscale version (Figure \@ref(fig:tech-stocks-good-legend-cvd), bottom right). Even though the two colors for Facebook and Alphabet have virtually the same gray value, we can see that Microsoft and Apple are represented by darker colors and take the bottom two spots. Therefore, we correctly assume that the highest line corresponds to Facebook and the second-highest line to Alphabet.
(ref:tech-stocks-good-legend-cvd) Color-vision-deficiency simulation of Figure \@ref(fig:tech-stocks-good-legend).
```{r tech-stocks-good-legend-cvd, fig.width = 5.5*6/4.2, fig.asp = 0.66, fig.cap = '(ref:tech-stocks-good-legend-cvd)'}
price_plot_good_small <-
price_plot_base_good +
geom_line(
size = .85*0.66, # make line size a little bigger than mathematically correct
na.rm = TRUE
) +
theme_dviz_hgrid(
.655*14,
line_size = .85*.5 # make line size a little bigger than mathematically correct
) +
theme(plot.margin = margin(18, 1, 1, 1))
cvd_sim2(price_plot_good_small, label_size = 14, label_y = .98, scale = .95)
```
## Designing figures without legends
Even though legend legibility can be improved by encoding data redundantly, in multiple aesthetics, legends always put an extra mental burden on the reader. In reading a legend, the reader needs to pick up information in one part of the visualization and then transfer it over to a different part. We can typically make our readers' lives easier if we eliminate the legend altogether. Eliminating the legend does not mean, however, that we simply not provide one and instead write sentences such as "The yellow dots represent *Iris versicolor*" in the figure caption. Eliminating the legend means that we design the figure in such a way that it is immediately obvious what the various graphical elements represent, even if no explicit legend is present.
The general strategy we can employ is called *direct labeling*, whereby we place appropriate text labels or other visual elements that serve as guideposts to the rest of the figure. We have previously encountered direct labeling in Chapter \@ref(color-pitfalls) (Figure \@ref(fig:popgrowth-vs-popsize-bw)), as an alternative to drawing a legend with over 50 distinct colors. To apply the direct labeling concept to the stock-price figure, we place the name of each company right next to the end of its respective data line (Figure \@ref(fig:tech-stocks-good-no-legend)).
(ref:tech-stocks-good-no-legend) Stock price over time for four major tech companies. The stock price for each company has been normalized to equal 100 in June 2012. Data source: Yahoo Finance
```{r tech-stocks-good-no-legend, fig.cap = '(ref:tech-stocks-good-no-legend)'}
price_plot <- price_plot_base_good +
geom_line(size = 0.66, na.rm = TRUE) +
theme_dviz_hgrid()
yann <- axis_canvas(price_plot, axis = "y") +
geom_text(
data = filter(tech_stocks, date == "2017-06-02"),
aes(y = price_indexed, label = paste0(" ", company)),
family = dviz_font_family,
x = 0, hjust = 0, size = 12/.pt
)
price_plot_ann <- insert_yaxis_grob(
price_plot + theme(legend.position = "none"),
yann,
width = grid::unit(0.3, "null")
)
ggdraw(price_plot_ann)
```
```{block type='rmdtip', echo=TRUE}
Whenever possible, design your figures so they don't need a legend.
```
We can also apply the direct labeling concept to the iris data from the beginning of this chapter, specifically Figure \@ref(fig:iris-scatter-three-shapes). Because it is a scatter plot of many points that separate into three different groups, we need to direct label the groups rather than the individual points. One solution is to draw ellipses that enclose the majority of the points and then label the ellipses (Figure \@ref(fig:iris-scatter-with-ellipses)).
(ref:iris-scatter-with-ellipses) Sepal width versus sepal length for three different iris species. I have removed the background grid from this figure because otherwise the figure was becoming too busy.
```{r iris-scatter-with-ellipses, fig.width = 4.6, fig.asp = 0.8, fig.cap = '(ref:iris-scatter-with-ellipses)'}
label_df <- data.frame(
Species = c("setosa", "virginica", "versicolor"),
label = c("Iris setosa", "Iris virginica", "Iris versicolor"),
Sepal.Width = c(4.2, 3.76, 2.08),
Sepal.Length = c(5.7, 7, 5.1),
hjust = c(0, 0.5, 0),
vjust = c(0, 0.5, 1))
iris_scatter3 <- ggplot(iris,
aes(
x = Sepal.Length,
y = Sepal.Width,
color = Species
)
) +
geom_point(
aes(shape = Species, fill = Species),
size = 2.5,
position = position_jitter(
width = 0.01 * diff(range(iris$Sepal.Length)),
height = 0.01 * diff(range(iris$Sepal.Width)),
seed = 3942)
) +
stat_ellipse(size = 0.5) +
geom_text(
data = label_df,
aes(
x = Sepal.Length, y = Sepal.Width, label = label, color = Species,
hjust = hjust, vjust = vjust
),
family = dviz_font_family,
size = 14/.pt,
fontface = "italic",
inherit.aes = FALSE
) +
scale_shape_manual(
values = c(21, 22, 23),
breaks = breaks,
name = NULL
) +
scale_fill_manual(
values = c("#56B4E980", "#E69F0080", "#009E7380"),
breaks = breaks,
name = NULL
) +
scale_color_manual(
values = darken(c("#56B4E9", "#E69F00", "#009E73"), 0.3),
breaks = breaks,
name = NULL
) +
guides(fill = "none", color = "none", shape = "none") +
scale_x_continuous(
limits = c(3.95, 8.2), expand = c(0, 0),
labels = c("4.0", "5.0", "6.0", "7.0", "8.0"),
name = "sepal length"
) +
scale_y_continuous(
limits = c(1.9, 4.6), expand = c(0, 0),
name = "sepal width"
) +
theme_dviz_open()
iris_scatter3
```
For density plots, we can similarly direct-label the curves rather than providing a color-coded legend (Figure \@ref(fig:iris-densities-direct-label)). In both Figures \@ref(fig:iris-scatter-with-ellipses) and \@ref(fig:iris-densities-direct-label), I have colored the text labels in the same colors as the data. Colored labels can greatly enhance the direct labeling effect, but they can also turn out very poorly. If the text labels are printed in a color that is too light, then the labels become difficult to read. And, because text consists of very thin lines, colored text often appears to be lighter than an adjacent filled area of the same color. I generally circumvent these issues by using two different shades of each color, a light one for filled areas and a dark one for lines, outlines, and text. If you carefully inspect Figure \@ref(fig:iris-scatter-with-ellipses) or \@ref(fig:iris-densities-direct-label), you will see how each data point or shaded area is filled with a light color and has an outline drawn in a darker color of the same hue. And the text labels are drawn in the same darker colors.
(ref:iris-densities-direct-label) Density estimates of the sepal lengths of three different iris species. Each density estimate is directly labeled with the respective species name.
```{r iris-densities-direct-label, fig.cap = '(ref:iris-densities-direct-label)'}
# compute densities for sepal lengths
iris_dens <- group_by(iris, Species) %>%
do(ggplot2:::compute_density(.$Sepal.Length, NULL)) %>%
rename(Sepal.Length = x)
# get the maximum values
iris_max <- filter(iris_dens, density == max(density)) %>%
ungroup() %>%
mutate(
hjust = c(0, 0.4, 0),
vjust = c(1, 0, 1),
nudge_x = c(0.11, 0, 0.24),
nudge_y = c(-0.02, 0.02, -0.02),
label = paste0("Iris ", Species)
)
iris_p <- ggplot(iris_dens, aes(x = Sepal.Length, y = density, fill = Species, color = Species)) +
geom_density_line(stat = "identity") +
geom_text(
data = iris_max,
aes(
label = label, hjust = hjust, vjust = vjust, color = Species,
x = Sepal.Length + nudge_x,
y = density + nudge_y
),
family = dviz_font_family,
size = 14/.pt,
inherit.aes = FALSE,
fontface = "italic"
) +
scale_color_manual(
values = darken(c("#56B4E9", "#E69F00", "#009E73"), 0.3),
breaks = c("virginica", "versicolor", "setosa"),
guide = "none"
) +
scale_fill_manual(
values = c("#56B4E950", "#E69F0050", "#009E7350"),
breaks = c("virginica", "versicolor", "setosa"),
guide = "none"
) +
scale_x_continuous(expand = c(0, 0), name = "sepal length") +
scale_y_continuous(limits = c(0, 1.5), expand = c(0, 0)) +
coord_cartesian(clip = "off") +
theme_dviz_hgrid() +
theme(
axis.line.x = element_blank(),
plot.margin = margin(6.5, 1.5, 3.5, 1.5)
)
iris_p
```
We can also use density plots such as the one in Figure \@ref(fig:iris-densities-direct-label) as a legend replacement, by placing the density plots into the margins of a scatter plot (Figure \@ref(fig:iris-scatter-dens)). This allows us to direct-label the marginal density plots rather than the central scatter plot and hence results in a figure that is somewhat less cluttered than Figure \@ref(fig:iris-scatter-with-ellipses) with directly-labeled ellipses.
(ref:iris-scatter-dens) Sepal width versus sepal length for three different iris species, with marginal density estimates of each variable for each species.
```{r iris-scatter-dens, fig.width = 5*6/4.2, fig.asp=0.85, fig.cap = '(ref:iris-scatter-dens)'}
# compute densities for sepal lengths
iris_dens2 <- group_by(iris, Species) %>%
do(ggplot2:::compute_density(.$Sepal.Width, NULL)) %>%
rename(Sepal.Width = x)
dens_limit <- max(iris_dens$density, iris_dens2$density) * 1.05 # upper limit of density curves
# we need different hjust and nudge values here
iris_max <-
iris_max %>%
mutate(
hjust = c(1, 0.4, 0),
vjust = c(1, 0, 1),
nudge_x = c(-0.18, 0, 0.47),
nudge_y = c(-0.01, 0.06, 0.03),
label = paste0("Iris ", Species)
)
xdens <- axis_canvas(iris_scatter2, axis = "x") +
geom_density_line(
data=iris_dens,
aes(x = Sepal.Length, y = density, fill = Species, color = Species),
stat = "identity", size = .2
) +
geom_text(
data = iris_max,
aes(
label = label, hjust = hjust, vjust = vjust, color = Species,
x = Sepal.Length + nudge_x,
y = density + nudge_y
),
family = dviz_font_family,
ize = 12/.pt,
#color = "black", inherit.aes = FALSE,
fontface = "italic"
) +
scale_color_manual(
values = darken(c("#56B4E9", "#E69F00", "#009E73"), 0.3),
breaks = c("virginica", "versicolor", "setosa"),
guide = "none"
) +
scale_fill_manual(
values = c("#56B4E950", "#E69F0050", "#009E7350"),
breaks = c("virginica", "versicolor", "setosa"),
guide = "none"
) +
scale_y_continuous(limits = c(0, dens_limit), expand = c(0, 0))
ydens <- axis_canvas(iris_scatter2, axis = "y", coord_flip = TRUE) +
geom_density_line(
data = iris_dens2,
aes(x = Sepal.Width, y = density, fill = Species, color = Species),
stat = "identity", size = .2
) +
scale_color_manual(
values = darken(c("#56B4E9", "#E69F00", "#009E73"), 0.3),
breaks = c("virginica", "versicolor", "setosa"),
guide = "none"
) +
scale_fill_manual(
values = c("#56B4E950", "#E69F0050", "#009E7350"),
breaks = c("virginica", "versicolor", "setosa"),
guide = "none"
) +
scale_y_continuous(limits = c(0, dens_limit), expand = c(0, 0)) +
coord_flip()
p1 <- insert_xaxis_grob(
iris_scatter2 + theme(legend.position = "none"),
xdens,
grid::unit(3*14, "pt"), position = "top"
)
p2 <- insert_yaxis_grob(p1, ydens, grid::unit(3*14, "pt"), position = "right")
ggdraw(p2)
```
And finally, whenever we encode a single variable in multiple aesthetics, we don't normally want multiple separate legends for the different aesthetics. Instead, there should be only a single legend-like visual element that conveys all mappings at once. In the case where we map the same variable onto a position along a major axis and onto color, this implies that the reference color bar should run along and be integrated into the same axis. Figure \@ref(fig:temp-ridgeline-colorbar) shows a case where we map temperature to both a position along the *x* axis and onto color, and where we therefore have integrated the color legend into the *x* axis.
(ref:temp-ridgeline-colorbar) Temperatures in Lincoln, Nebraska, in 2016. This figure is a variation of Figure \@ref(fig:temp-ridgeline). Temperature is now shown both by location along the *x* axis and by color, and a color bar along the *x* axis visualizes the scale that converts temperatures into colors.
```{r temp-ridgeline-colorbar, fig.asp = 0.652, fig.cap = '(ref:temp-ridgeline-colorbar)'}
bandwidth <- 3.4
lincoln_base <- ggplot(lincoln_weather, aes(x = `Mean Temperature [F]`, y = `Month`, fill = ..x..)) +
geom_density_ridges_gradient(
scale = 3, rel_min_height = 0.01, bandwidth = bandwidth,
color = "black", size = 0.25
) +
scale_x_continuous(
name = "mean temperature (°F)",
expand = c(0, 0), breaks = c(0, 25, 50, 75), labels = NULL
) +
scale_y_discrete(name = NULL, expand = c(0, .2, 0, 2.6)) +
scale_fill_continuous_sequential(
palette = "Heat",
l1 = 20, l2 = 100, c2 = 0,
rev = FALSE
) +
guides(fill = "none") +
theme_dviz_grid() +
theme(
axis.text.y = element_text(vjust = 0),
plot.margin = margin(3, 7, 3, 1.5)
)
# x axis labels
temps <- data.frame(temp = c(0, 25, 50, 75))
# calculate corrected color ranges
# stat_joy uses the +/- 3*bandwidth calculation internally
tmin <- min(lincoln_weather$`Mean Temperature [F]`) - 3*bandwidth
tmax <- max(lincoln_weather$`Mean Temperature [F]`) + 3*bandwidth
xax <- axis_canvas(lincoln_base, axis = "x", ylim = c(0, 2)) +
geom_ridgeline_gradient(
data = data.frame(temp = seq(tmin, tmax, length.out = 100)),
aes(x = temp, y = 1.1, height = .9, fill = temp),
color = "transparent"
) +
geom_text(
data = temps, aes(x = temp, label = temp),
color = "black", family = dviz_font_family,
y = 0.9, hjust = 0.5, vjust = 1, size = 14/.pt
) +
scale_fill_continuous_sequential(
palette = "Heat",
l1 = 20, l2 = 100, c2 = 0,
rev = FALSE
)
lincoln_final <- insert_xaxis_grob(lincoln_base, xax, position = "bottom", height = unit(0.1, "null"))
ggdraw(lincoln_final)
```