forked from lse-my472/lse-my472.github.io
-
Notifications
You must be signed in to change notification settings - Fork 2
/
02-vectors-lists-dfs.Rmd
263 lines (184 loc) · 6.43 KB
/
02-vectors-lists-dfs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
title: "Vectors, lists, and data frames"
date: "27 September 2022"
---
This R Markdown file provides a brief review of some building blocks that we will rely on throughout this course. Make sure to finish the R preparatory course which is linked on the Moodle page of the course.
### Using R as a calculator
The most basic functionality of R is using it as a calculator
```{r}
10 / 2
sqrt(100) + sqrt(9)
exp(1)
2^3
```
### Objects and operators
What makes R very powerful is that you can store results as "objects"
```{r}
x <- 5
y <- 10
```
If you look at the `Environment` panel in your RStudio session, you can see that these numbers are stored in memory.
Then you can do operations with them, the same way you would do with numbers:
```{r}
x * y
```
You can also save combinations of objects as new objects
```{r}
z <- x * y
z
```
You can also modify existing objects.
```{r}
x <- x + 1
x
```
Note that we've used the `<-` sign to assign values to objects. That's the *assignment* operator. Using `<-` instead if `=` also emphasizes that the `=` used in programming is conceptually not a mathematical equal sign.
```{r}
x = x + 1
x
```
You can also use `=`, although `<-` is generally preferred. There's a more technical explanation for this preference, but another is that this way you avoid getting confused with `==`, which is used to compare objects:
```{r}
2 == 2
c(1, 2, 3) == 2
```
`==` is a *logical operator*, meaning it outputs `TRUE` or `FALSE`. Other logical operators are:
```{r}
1 != 2 # not equal to
2 < 2 # less than
2 <= 2 # less than or equal to
2 > 2 # greater than
2 >= 2 # greater than or equal to
(2 < 2) | (2 <= 2) # or
(2 < 2) & (2 <= 2) # and
```
### Data types
R has many data types, but the most common ones we'll use are:
1. numeric: `1.1`, `3`, `317`, `Inf`...
2. logical: `TRUE` or `FALSE`
3. character: `this is a character`, `hello world!`...
4. factor: `Democrat`, `Republican`, `Socialist`, ...
A small trick regarding logical values is that they correspond to `1` and `0`. This will come in hand to count the number of `TRUE` values in a vector.
```{r}
x <- c(TRUE, TRUE, FALSE)
x * 2
sum(x)
```
There are a few special values: `NA`, which denotes a missing value, and `NaN`, which means Not a number. The values `Inf` and `-Inf` are considered numeric. `NULL` denotes a value that is undefined.
```{r}
0 / 0 # NaN
1 / 0 # Inf
x <- c(1, NA, 0)
x
```
Probably one of the most useful functions in R is `str`. It displays the internal structure of an object.
```{r}
str(x)
```
Of course you can always print the object in the console:
```{r}
print(x)
```
Note that `print` here is a function: it takes a series of arguments (in this case, the object `x`) and returns a value (`50`).
This is equivalent to just typing the name of the object in the console. (What's going on behind the scenes is that R is calling the default function to print this object; which in this case is just `print`).
```{r}
x
```
You can find out the data type for each object in `R` using the function `class`, or functions that start with `is.` and then the data type:
```{r}
class("hello world!")
class(42)
is.numeric("hello world!")
is.character("hello world")
class(c(1, NA, 0))
is.numeric(c(1, NA, 0))
```
Probably one of the most useful functions in R is `str`. It displays the internal structure of an object.
```{r}
str(as.factor(c("Blue", "Blue", "Red")))
```
### Data structures
Building off of the data types we've learned, *data structures* combine multiple values into a single object. Some common data structures in `R` include:
1. vectors: sequence of values of a certain type
2. data frame: a table of vectors, all of the same length
3. list: collection of objects of different types
#### Vectors
We've already seen vectors created by **c**ombining multiple values with the `c` command:
```{r}
student_names <- c("Bill", "Jane", "Sarah", "Fred", "Paul")
math_scores <- c(80, 75, 91, 67, 56)
verbal_scores <- c(72, 90, 99, 60, 68)
```
There are shortcuts for creating vectors with certain structures, for instance:
```{r}
nums1 <- 1:100
# -10, -5, 0, ..., 100
nums2 <- seq(-10, 100, by = 5)
# 467 equally spaced numbers between -10 and 100
nums3 <- seq(-10, 100, length.out = 467)
```
Notice that we used `seq` to generate both `nums2` and `nums3`. The different behavior is controlled by which arguments (e.g. `by`, `length.out`) are supplied to the function `seq`.
With vectors we can carry out some of the most fundamental tasks in data analysis, such as descriptive statistics
```{r}
mean(math_scores)
min(math_scores - verbal_scores)
summary(verbal_scores)
```
and plots.
```{r}
plot(x = math_scores, y = verbal_scores)
text(x = math_scores, y = verbal_scores, labels = student_names)
```
It's easy to pull out specific entries in a vector using `[]`. For example,
```{r}
math_scores[3]
math_scores[1:3]
math_scores[-c(4:5)]
math_scores[which(verbal_scores >= 90)]
math_scores[3] <- 92
math_scores
```
#### Data frames
Data frames allow us to combine many vectors of the same length into a single object.
```{r}
students <- data.frame(student_names, math_scores, verbal_scores)
students
summary(students)
```
Notice that `student_names` is a different class (character) than `math_scores` (numeric), yet a data frame combines their values into a single object. We can also create data frames that include new variables:
```{r}
students$final_scores <- 0
students$final_scores <- (students$math_scores + students$verbal_scores) / 2
students
```
```{r}
age <- c(18, 19, 20, 21, 22)
students2 <- data.frame(student_names, age)
students2
```
And merge them with other dataframes (here based on the students_name column)
```{r}
# merge different data frames
students3 <- merge(students, students2)
students3
```
#### Lists
Lists are an even more flexible way of combining multiple objects into a single object. As you will see throughout the course, we will use lists to store the output of our scraping steps. Using lists, we can combine together vectors of different lengths:
```{r}
list1 <- list(some_numbers = 1:10, some_letters = c("a", "b", "c"))
list1
```
or even vectors and data frames, or multiple data frames:
```{r}
schools <- list(school_name = "LSE", students = students,
faculty = data.frame(name = c("Kelly Jones", "Matt Smith"),
age = c(41, 55)))
schools
```
You can access a list component in several different ways:
```{r}
schools[[1]]
schools[["faculty"]]
schools$students
schools[["students"]]
```