forked from statOmics/biostatistics
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Exp_design_fast_1_ex.Rmd
192 lines (144 loc) · 5.96 KB
/
Exp_design_fast_1_ex.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: "Experimental Design II: replication and power exercise 1"
author: "Lieven Clement & Alexandre Segers"
date: "statOmics, Ghent University (https://statomics.github.io)"
output:
html_document:
code_download: yes
theme: cosmo
toc: yes
toc_float: yes
highlight: tango
number_sections: yes
---
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
```{r}
library(tidyverse)
```
# Power
The power of a test is defined as:
$$P(p <
\alpha | H_1)$$
This is the probability to reject the nulhypothesis at the significance level $\alpha$ given that the alternative hypothesis is true.
The power depends on:
- the real effect size in the population $\mathbf{L}^T\boldsymbol{\beta}$.
- the number of observations: SE and df.
- Choice of designpoints
- Choice of significance-level $\alpha$.
We will evaluate the power using simulation.
# Rodents
A biologist examined the effect of a fungal infection on the eating behavior of rodents.
Infected apples were offered to a group of eight rodents, and sterile apples were offered to a group of 4 rodents. The amount of grams of apples consumed per kg body weight are given in the dataset below.
```{r}
rodents <- data.frame(weight = c(11,33,48,34,112,369,64,44,177,80,141,332),
group = as.factor(c(rep("treat",8), rep("ctrl",4))))
rodents
```
## Data exploration
Make visualizations: boxplot and QQ-plot.
```{r}
```
In the data exploration we do not have enough data to evaluate the assumptions.
Suppose that the assumptions are valid and that standard deviation in the
population would be equal to the ones you observed in the experiment.
# Assess the effect of fungal infection on apple consumption
We will model the data using a linear model with one dummy variable. Test the
relevant model parameter.
```{r}
```
# Power analyses
## Question 1
What is the power of the experiment if the effect size and standard deviation
in the population would be equal to the ones you observed in the experiment?
## Simulation function
We provide you with a function to simulate data similar to that of our
experiment under our model assumptions.
```{r}
simFast <- function(form, data, betas, sd, contrasts, alpha = .05, nSim = 10000)
{
ySim <- rnorm(nrow(data)*nSim,sd=sd)
dim(ySim) <-c(nrow(data),nSim)
design <- model.matrix(form, data)
ySim <- ySim + c(design %*%betas)
ySim <- t(ySim)
### Fitting
fitAll <- limma::lmFit(ySim,design)
### Inference
varUnscaled <- c(t(contrasts)%*%fitAll$cov.coefficients%*%contrasts)
contrasts <- fitAll$coefficients %*%contrasts
seContrasts <- varUnscaled^.5*fitAll$sigma
tstats <- contrasts/seContrasts
pvals <- pt(abs(tstats),fitAll$df.residual,lower.tail = FALSE)*2
return(mean(pvals < alpha))
}
```
Without going into the full code details, this function allows us to simulate
data similar to that of our experiment under our model assumptions, given the
following inputs:
- `form`: model formula for the experiment we want to simulate (e.g. "~ gewicht + dosis")
- `data`: the target dataset on which we want to base our simulations on (dataframe with colnames equal to the variable names, values equal to the design points chosen, e.g dataframe(gewicht = rep(c(1,2), each = 4), dosis = rep(c(1,2), times = 4)) )
- `betas`: the linear regression coefficients for the target dataset
- `sd`: the residual standard errors from the linear regression model fit on
the target dataset
- `contrasts`: comparison of interest, i.e. which (combination of) model
parameters we would like to assess (should be a matrix containing the parameter names as rownames (names(mod1$coefficients)), in the column the contrast that you want to test e.g matrix(c(0,1,0),ncol=1))
- `alpha`: alpha-level at which to conduct the hypothesis testing
- `nSim`: number of datasets we would like to simulate
To simulate new data based on our target dataset `data`, we will need to
fill in all the arguments to the `simFast` function.
**Hint: for the betas and sd, we will need to fit a linear model first!**
```{r, eval=FALSE}
power1 <- simFast(form = ...,
data = ...,
betas = ...,
sd = ...,
contrasts = ...,
alpha = ...,
nSim = ...)
power1
```
Interpret the outcome.
## Question 2
What would the power by if number of rodents would balanced be in both groups?
Again, we simulate a large number of new experiments. Adjust one more more
of the inputs to the `simFast` to allow for addressing this question.
```{r, eval=FALSE}
power2 <- simFast(form = ...,
data = ...,
betas = ...,
sd = ...,
contrasts = ...,
alpha = ...,
nSim = ...)
power2
```
## Question 3
How many observations would you need to pick up the treatment effect with
a power of 90%?
This will require us to establish a relationship between sample size and power.
In turn, this will require us to perform a "parameter sweep" over different
values of sample size and compute the power for each of them.
```{r,eval=FALSE}
power <- data.frame(n=seq(5,50,5),power=NA)
for (i in 1:nrow(power))
{
n1 <- n2 <- power$n[i]
predictorData <- data.frame(...)
power$power[i] <- simFast(form = ...,
data = predictorData,
betas = ...,
sd = ...,
contrasts = ...,
alpha = ...,
nSim = ...)
}
power
```
## Question 4
Suppose that we would like to pick up an effect size of $\beta_1 = 60 g/kg$.
How many samples would be required in each group to obtain a power of 90%?
Note, that
- we do a two-sided test so the sign of the effect size is arbitrary.
- the intercept in the power analysis is also arbitrary so we could also set it at 0.
```{r}
```