Jenny Bryan
Note: this HTML is made by applying knitr::spin()
to an R script. So the
narrative is very minimal.
library(ggplot2)
pick a way to load the data
#gdURL <- "http://tiny.cc/gapminder"
#gapminder <- read.delim(file = gdURL)
#gapminder <- read.delim("gapminderDataFiveYear.tsv")
library(gapminder)
str(gapminder)
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : num 1952 1957 1962 1967 1972 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ gdpPercap: num 779 821 853 836 740 ...
stripplots: univariate scatterplots (but w/ ways to also convey 1+ factors)
ggplot(gapminder, aes(x = continent, y = lifeExp)) + geom_point()
we have an overplotting problem; need to spread things out
ggplot(gapminder, aes(x = continent, y = lifeExp)) + geom_jitter()
we can have less jitter in x, no jitter in y, more alpha transparency
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_jitter(position = position_jitter(width = 0.1, height = 0), alpha = 1/4)
boxplots -- covered properly elsewhere
ggplot(gapminder, aes(x = continent, y = lifeExp)) + geom_boxplot()
raw data AND boxplots
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot(outlier.colour = "hotpink") +
geom_jitter(position = position_jitter(width = 0.1, height = 0), alpha = 1/4)
superpose a statistical summary
ggplot(gapminder, aes(x = continent, y = lifeExp)) +
geom_jitter(position = position_jitter(width = 0.1), alpha = 1/4) +
stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)
let's reorder the continent factor based on lifeExp
ggplot(gapminder, aes(reorder(x = continent, lifeExp), y = lifeExp)) +
geom_jitter(position = position_jitter(width = 0.1), alpha = 1/4) +
stat_summary(fun.y = median, colour = "red", geom = "point", size = 5)
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gapminder_0.1.0 ggplot2_1.0.0 knitr_1.10.5
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.2-4 digest_0.6.8 evaluate_0.7
## [4] formatR_1.2 grid_3.1.2 gtable_0.1.2
## [7] htmltools_0.2.6 labeling_0.3 magrittr_1.5
## [10] MASS_7.3-35 munsell_0.4.2 plyr_1.8.2
## [13] proto_0.3-10 Rcpp_0.11.6 reshape2_1.4.0.99
## [16] rmarkdown_0.5.1 scales_0.2.4 stringi_0.4-1
## [19] stringr_1.0.0 tools_3.1.2 yaml_2.1.13