forked from lmullen/dh-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
setup.Rmd
116 lines (72 loc) · 10.1 KB
/
setup.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Setting up R"
---
How to set up R.
## Installing R
### On Ubuntu
R can be installed on Ubuntu using the package manager. The version in the Ubuntu repositories is almost certainly out of date, so the best thing to do is to add the CRAN repository using the RStudio mirror. (You will have to run most of these commands as root, prefixing them with the command `sudo`.)
```
add-apt-repository "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/"
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9
```
You should then be able to update the list of Ubuntu packages and install the necessary R packages:
```
apt-get install r-base r-base-core r-base-dev
```
You can then test whether R is properly installed by running the following command in your terminal:
```
R --version
```
For some tasks in R, like package development, you'll need many more external dependencies. The following will take a long time to download and install, but it will provide almost all of what you need.
```
apt-get install libcurl4-openssl-dev libxml2-dev qpdf texlive-full build-essential gdal-bin libgdal-dev libproj-dev libxml2-dev git
```
## Installing R Studio
## Packages and Libraries
A **package** is a collection of R files bundled together to accomplish a particular purpose. Usually these packages provide a set of functions to provide a particular kind of analysis. The [dplyr](http://cran.rstudio.com/web/packages/dplyr/) package, for example, provides a set of functions for manipulating data, and the [rgdal](http://cran.rstudio.com/web/packages/rgdal/) package provides a set of functions for working with geospatial files. Because R is especially good at data analysis, it is also common to provide data packages which wrap generally useful datasets. R has a built in [datasets](http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html) package which provides a set of sample datasets, and the [babynames](https://github.com/hadley/babynames) package provides data about births from the Social Security Administration.^[For a fuller description of packages, see the introduction and chapter 1 of Hadley Wickham, *[R Packages](http://r-pkgs.had.co.nz/)* (O'Reilly, forthcoming).]
A **library** is a collection of packages stored on your computer. R has a built-in library which stores the packages which come bundled with base R. You will also create a personal package library which stores the packages that you have installed. You needn't worry about where this library is on your computer: R will prompt you for permission to create it the first time that you install packages. (You can find where it is by running the function `.libPaths()`. When you work on specific analysis projects, it is desirable to keep a separate library of packages uses just for that project in the project directory. This will ensure that you can later re-run your analysis. This can be managed using the [Packrat](http://rstudio.github.io/packrat/) package, which we'll cover later. (See the chapter on [reproducible research](reproducible.html).)
## Installing packages
You won't get very far in doing historical work with R without installing a few packages. This section will guide you in installing a few basic packages which you can't live without. Then it will suggest some packages categorized by the type of analysis that you wish to do.
There are two main places from which you can install packages. [CRAN](http://cran.rstudio.com/), the Comprehensive R Archive Network, is a central repository of R packages.^[There are other CRAN-like repositories, such as [Bioconductor](http://www.bioconductor.org/) which provide discipline specific packages, but these are all for scientists.] To be published on CRAN, packages have to go through a rigorous testing process. You can be fairly confident that a CRAN package is, technically speaking at least, reliable. R users also have a culture oriented around academic research, so you can be more confident with CRAN packages than with the repositories for most languages that the functions provided are academically rigorous, and in some cases at least, peer reviewed. CRAN provides a wide array of packages, for statistics, business, science, social science, geospatial analysis, network analysis, text mining, natural language processing, visualization, and other domains. Most of the packages are not targeted specifically at historians (you can help change this!) but many can be used by historians. You can browse the [list of packages available at CRAN by name](http://cran.rstudio.com/web/packages/available_packages_by_name.html), or can browse the [task views](http://cran.rstudio.com/web/views/) which provide guides to packages relevant to a particular analytical technique.
R includes a function that lets you install packages from CRAN. Running the function below will install the package [devtools](https://github.com/hadley/devtools), which provides many functions useful for writing R code. If this is the first time that you have installed an R package, you may be prompted to create a personal package library; just accept whatever the default location is for your operating system.
```{r eval=FALSE}
install.packages("devtools")
```
There are two common operations that you might want to do with `install.packages()`. One is to install more than one package at a time. You can combine the names of the packages that you want to install into a character vector. In this example, we'll install the [httr](http://cran.r-project.org/web/packages/httr/index.html) package, which lets you make requests from R over the internet via HTTP, and the [jsonlite](http://cran.r-project.org/web/packages/jsonlite/index.html) package, which lets you read the JSON data format which is commonly used to exchange data through web applications: `c("httr", "jsonlite")`. You might also want to install all the dependencies that a package needs. By default when using `install.packages()` to install package A, you will also install the packages B and C without which A cannot work. But packages will also have other dependencies which add additional functionality. Specifying `dependencies = TRUE` will install all the packages which A can possibly use: B and C, but perhaps also D and E. Putting those two arguments together, we can install httr and jsonlite with all their dependencies:
```{r eval=FALSE}
install.packages(c("httr", "jsonlite"), dependencies = TRUE)
```
The second main place to install packages is from [GitHub](https://github.com/), a website for sharing open-source code. Many R packages are developed on GitHub, so you can install more recent versions of them than are available on CRAN. Other packages are distributed only informally via GitHub rather than formally via CRAN. Such packages can be installed using the `install_github()` function provided by the devtools package we installed earlier.
Let's install the [historydata](https://github.com/lmullen/historydata) package which provides sample datasets of interest to historians. This package will give us something to work with later. You can [find this package](https://github.com/lmullen/historydata) at GitHub. Notice that the README provides instructions on how to install the package. But even if it didn't, we could find out how to install the package from its URL. Notice the URL: `https://github.com/lmullen/historydata`. After GitHub's domain name, there are two parts to the URL. The username of the person or organization who controls this package is first: `lmullen`. Next is the name of the repository (which probably should be the same as the name of the package, but which might not be): `historydata`. We can use that information to install the file with devtools.^[The double colon operator `::` lets us specify which package a function comes from. In this case we are using it as a shortcut so we don't have to load devtools into the global environment.]
```{r eval=FALSE}
devtools::install_github("lmullen/historydata")
```
Now using `install.packages()` and `devtools::install_github()` you'll be able to install the vast majority of the packages that are useful to you.
There is one more wrinkle to consider. R is a high-level language which uses libraries provided by low-level languages. For example, the [RCurl](http://cran.r-project.org/web/packages/RCurl/index.html) package (which many R packages need to interact with the internet) requires the [libcurl](http://curl.haxx.se/libcurl/) library which is external to R. The [rgdal](http://cran.rstudio.org/web/packages/rgdal/) package depends on the [GDAL/OGR](http://www.gdal.org/) Geospatial Data Abstraction Library. When looking at the documentation for package, note if it requires any external libraries. Those external libraries will have to be installed on your operating system. On Mac OS X, this is most easily done with [Homebrew](http://brew.sh/); on Ubuntu and other Linux distributions you can use the package manager; on Windows you will have to follow the instructions from the developers of the external library. In the list of suggested R packages below, I have mentioned the any external libraries which are required.
## Suggested packages
The following are some suggested packages for historical analysis. We'll use most of these throughout this book.
General R functionality:
- [devtools](http://cran.rstudio.org/web/packages/devtools/)
- [packrat](http://cran.rstudio.org/web/packages/packrat/)
- [knitr](http://cran.rstudio.org/web/packages/knitr/)
- [RMarkdown](http://rmarkdown.rstudio.com/)
Data:
- [historydata](https://github.com/lmullen/historydata)
Web:
- [httr](http://cran.rstudio.org/web/packages/httr/)
- [rmetadata](https://github.com/ropensci/rmetadata)
Visualization:
- [ggplot2](http://cran.rstudio.org/web/packages/ggplot2/)
- [ggvis](http://ggvis.rstudio.com/)
Data manipulation:
- [dplyr](http://cran.rstudio.org/web/packages/dplyr/)
- [magrittr](http://cran.rstudio.org/web/packages/magrittr/)
Geospatial and mapping:
- [ggmap](http://cran.rstudio.org/web/packages/ggmap/)
- [sp](http://cran.rstudio.org/web/packages/sp/)
- [maptools](http://cran.rstudio.org/web/packages/maptools/)
- [rgeos](http://cran.rstudio.org/web/packages/rgeos/)
- [rgdal](http://cran.rstudio.org/web/packages/rgdal/)
## Updating Packages
## Citing packages
## .Rprofile