forked from RVerse-Tutorials/RWorkflow-NWFSC-2022
-
Notifications
You must be signed in to change notification settings - Fork 0
/
week5-introtopackages.Rmd
466 lines (332 loc) · 13 KB
/
week5-introtopackages.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
---
title: "Week 5. Intro to R packages"
output:
html_document:
toc: true
include:
after_body: footer.html
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
set.seed(1234)
```
```{r echo=FALSE, message=FALSE, warning=FALSE}
library(kableExtra)
dt <- data.frame("Compartmentalized", "Documented", "Extendible", "Reproducible", "Robust")
kable(dt, col.names=NULL) %>%
kable_styling(full_width = TRUE) %>%
row_spec(1, bold = FALSE, color = "white", background = "blue") %>%
column_spec(column = 1:5, width = "20%")
```
# Why a package?
## Shorter answer
* The package framework really helps you write robust code and well documented code.
* It makes it easy to bundle data with code.
* It make it easy to version and document your data.
## Longer answer
An R package is an easy and the standard way to organize your R code, document your code, and share your code with other people. Why use an R package rather than just make a bunch of scripts with your data in a folder?
* **Reproducibility and documentation** In the long-run, you will save yourself much work if you organize and document your code. Rather than writing a series of scripts that you copy and alter for each project, you think about how to make your scripts into functions.
* **You want to share your code** If you are making code to that can be used for different data, rather than only your specific problem, then you want to make a package so that you can share your code.
* **Robust data sharing!** Putting your data in a dedicated data package allows you to version your data (so everyone knows they are using the most up to date data), document your data, track data changes, provide data releases (with archives), provide easy visualizations of the data, and any other packages can load that data package and have access to the data.
* **You want to make an application** If you want to make a shiny application, having your code in a package will help.
# Create a simple package with RStudio
1. Open RStudio
2. In the upper right hand corner, click the blue cube with R, and click New Project.
3. In the pop up, click 'New Directory' and choose R package.
4. Name your package `TestPackage` and select the directory where to put it. Also check the little box saying 'Create git repository'.
5. Click Create Package.
That's it!!
You will see a 'Build' tab.
6. Click on the 'Build' tab in the upper right, and click 'Install and Restart'. Your package should build and load.
7. Click on click 'check'. Your package won't pass the checks because the license is not set.
**If you want to use RStudio Cloud**
Open this link, [TestPackage](https://rstudio.cloud/project/3592671). You will need to login. You can use your Google account.
# Parts of an R package
## The essentials
2 files and a directory.
* **DESCRIPTION** This file has the meta-data about your package. Name and what packages it depends on. Most of it is self-explanatory. The `Depends:` and `Imports:` lines specify any functions from other packages that you use in your functions.
* **NAMESPACE** This file indicates what needs to be exposed to users for your R package. For our course, you won't need to edit as {roxygen2} takes care of it.
* **R directory** This is where all your R code goes for your package.
## Basic add-ons
* **man** A directory for documentation. You won't need to write this. It will be added automatically by {roxygen2}.
* **data** A directory for data files saved in RData format with the ending `.rda` or `.RData`. Nothing else!
## Other add-ons
* `inst` folder for misc stuff
* `inst\extdata` folder for external data.
* `data-raw` A directory for raw data files that produced the data files in `data` folder.
* `.Rbuildignore` optional, but in practice you will always need this.
## The default files
By default, RStudio will create the following files.
* `DESCRIPTION`
```
Package: DeleteMe
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <[email protected]>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
License: What license is it under?
Encoding: UTF-8
LazyData: true
```
* `NAMESPACE`
```
exportPattern("^[[:alpha:]]+")
```
This is saying export all functions in the R folder.
# Let's create a real package
We are going to use {roxygen2} which will create our documentation. You should always use this. Don't get into the bad habit of writing functions without documentation headers!
We could use
```
usethis::create_package("../TestPackage")
```
to create our package with {roxygen2} set up but I'll walk you through do it manually.
## Install {roxygen2} if needed
```
install.packages("roxygen2")
```
## Delete the `NAMESPACE` file
{roxygen2} is going to create that so we need to get rid of non-roxygen2 one. If you forget, you'll see a warning and {roxygen2} won't delete the old one.
## Set Project Options
1. Click on Tools > Project Options > Build Tools
2. Make sure Generate documentation with Roxygen is checked. *Don't see that?* Then you need to install the {roxygen2} package.
3. Click Configure next to the Roxygen line. Make sure all the checkboxes are checked. The last 2 won't be by default.
## Add a function
1. Change `hello.R` in the R folder.
2. Paste this code into the script and save. The `#'` is the {roxygen2} header.
```
#' @title Hello!
#'
#' @description This function just says hello.
#'
#' @export
hello <- function(){ cat("HELLO") }
```
3. Click Install and Restart from the Build tab.
## Use your new function
Learn about your function with
```
?hello
```
Use your function with
```
hello()
```
## Add some data
1. Add a folder called `data`
2. Run these lines from the command line.
```
WWW2 <- WWWusage^2
save(WWW2, file="data/WWW2.rda")
```
3. Click Install and Restart from the Build tab
4. Now your data are available from your package. Type
```
WWW2
```
at the command line.
# Add a more realistic function
Now we will add a function that uses another R package.
1. Create a new R script file. File > New File > R Script.
2. Paste this code into the script and save as `littleforecast.R` in the R directory.
```
#' Forecast with Arima Model
#'
#' This fits an Arima model to data with forecast's auto.arima() function and plots
#' a forecast with the forecast() function.
#'
#' @param data A vector (time series) of data
#' @param nyears Number of time steps to forecast forward
#' @return A plot of a forecast.
#' @examples
#' dat <- WWWusage
#' littleforecast(dat, nyears=100)
#' @export
littleforecast <- function(data, nyears=10){
fit <- forecast::auto.arima(data)
fc <- forecast::forecast(fit, h = nyears)
ggplot2::autoplot(fc)
}
```
This function depends on some packages: {forecast} and {ggplot2}. We need to tell our package about these dependencies.
Add this line to `DESCRIPTION` file after the `Description:` line:
```
Imports: forecast, ggplot2
```
Click Build > Install and Restart. Now we can use our function.
```
littleforecast(WWW2)
```
## Clean up the DESCRIPTION file
Let's edit our `DESCRIPTION` file to look like so:
```
Package: TestPackage
Title: This Is A Toy Package
Version: 1.3
Author: Eli Holmes
Maintainer: <[email protected]>
Description: This is a super simple toy package for students to copy and experiment with for the short course.
Depends: R (>= 3.4.1)
Imports: forecast, ggplot2
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.2
```
The packages on the Depends and Imports lines are required to be installed in order to install your package. If the user doesn't have these packages, then they will be installed when installing the package. When you try to Build and Install, R will complain and throw an error if you are missing packages.
* `Depends:` means the user will have all the commands of that package at the command line.
* `Imports:` is any other R packages that your package needs in order to work but its functions won't be available at the command line (unless you choose).
## Look at the NAMESPACE file
{roxygen2} made this NAMESPACE file.
```
export(littleforecast)
export(hello)
```
How does {roxygen2} know to export a function? Add this to the documentation code at the top of your functions.
```
#' @export
```
# The R Directory: Function code
This is where functions are put **and our data documentation files**. Each file is a separate function. You can put multiple functions in one file, but that can get confusing unless they are small functions. The top of the function has documentation in {roxygen2} format.
```
#' @title A little foo function
#'
#' @description This little function does this.
#'
#' @param arg1 what this argument is
#' @export foo
foo <- function(arg1){
# The work
return(<what you want to return to user>)
}
```
# `.Rbuildignore`
Though not required, in practice you will need to tell R what not to include in your package. RStudio will make this for you but you need to check it and add more stuff.
```
^.*\.Rproj$
^\.Rproj\.user$
^TestPackage\.Rcheck$
^TestPackage.*\.tar\.gz$
^TestPackage.*\.tgz$
.github
.git
```
# Functions with pipes
1. Create a new R script file. File > New File > R Script.
2. Paste this code into the script and save as `irisaverages.R` in the R directory.
```
#' dplyr example
#'
#' This adds a new function that needs {dplyr}
#' @param col which column to average
#' @export
irisaverages <- function(col = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")){
col <- match.arg(col)
iris$col <- iris[[col]]
iris %>% dplyr::group_by(Species) %>%
dplyr::summarize(mean = mean(col))
}
```
We now use {dplyr} and `%>%` (pipe).
We can either add {dplyr} to `Depends` in our DESCRIPTION file but that would load the whole {dplyr} library and maybe we don't want to do that.
We can add {dplyr} to `Imports` but how to get `%>%`? Add a file `import_packages.R` to the R folder (the name of the file is unimportant).
```
#' @importFrom magrittr %>%
NULL
```
or add
```
#' @importFrom magrittr %>%
```
to the header of `irisaverages.R`.
*How would I ever remember this??* Sadly if your use the `%>%` pipe, you'll gets lots of practice with this. Starting with R version 4.1, [there is now a native R pipe](https://www.r-bloggers.com/2021/05/the-new-r-pipe/), `|>`, which works like `%>%` in most cases so you might want to switch to that.
# Documenting data
## Add the data
Add to the `data` folder as an `.rda` or `.RData` file.
```
setosa <- subset(iris, Species=="setosa")
save(setosa, file="data/setosa.rda")
```
## Document the data
Add in the R folder `data-setosa.R`. Tip, it is good to give your data documentation scripts a clear name tag to distinguish them from functions.
```
#' @title The setosa dataset
#'
#' @description
#'
#' \itemize{
#' \item Sepal.Length. length of sepals
#' \item Sepal.Width. with of sepals
#' \item Petal.Length. length of petals
#' \item Petal.Width. with of petals
#' }
#'
#' @docType data
#' @name setosa
#' @usage data(setosa)
#' @references R base package.
#' @format A data frame.
#' @keywords datasets
NULL
```
*Note*, in the latest Roxygen2, you don't need the `@name` but that only works if you use `LazyData: true` in your `DESCRIPTION` file. You might not want to load data every time the user loads the package.
5. Click Install and Restart from the Build tab.
## More details
The `rda` filename in the `data` folder is what is used to load data. For example, let's say you have
```
save(cars1, cars2, file="data/carsdata.rda")
```
So 2 data objects saved to one `rda` file. To load both data objects, you use
```
data(carsdata)
```
What do I document: `cars1`, `cars2` or `carsdata`? You can actually do whatever you want.
Do this to show this documentation with `?cars2`.
```
#' @title a dataset of horsepower for different cars
#'
#' @docType data
#' @name cars2
NULL
```
Do this to show this documentation with `?cars1`, `?cars2`, and `?carsdata`
```
#' @title some datasets of horsepower for different cars
#'
#' @docType data
#' @name carsdata
#' @aliases cars1 cars2
NULL
```
Do this to show this documentation with `?carsdata`.
```
#' @title three datasets of horsepower for different cars
#'
#' @docType data
#' @name carsdata
NULL
```
This will only work for data that are exported. That means `Lazydata: true` and what is loaded from `data(carsdata)`.
```
#' @title three datasets of horsepower for different cars
#'
#' @docType data
"cars2"
```
So this fails since it is not `carsdata` that is exported. That is just the name of the data file.
```
#' @title three datasets of horsepower for different cars
#'
#' @docType data
"carsdata"
```
# References
If/when you want to go into R packaging in more depth, see Hadley Wickham's book [R Packages](http://r-pkgs.had.co.nz/). However, for simple packages you don't need the book.
# Next Week
* Ways to share your package
* How to share your package on GitHub
* More on documentating your package functions and data
* Creating a nifty package landing page with all the documentation (one line of code!)
* Easy intro to GitHub Actions