generated from statOmics/Rmd-website
-
Notifications
You must be signed in to change notification settings - Fork 2
/
09_2_lettuce.Rmd
151 lines (105 loc) · 3.74 KB
/
09_2_lettuce.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: "Exercise 9.2: Non-parametric test on the lettuce dataset"
author: "Lieven Clement and Jeroen Gilis"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
# The lettuce dataset
In a previous tutorial, we analysed the dataset on
lettuce plants using ANOVA. However, it was not clear
if all the assumptions of ANOVA were met. Indeed, with
only 7 datapoints per group, it is very hard to assess
the assumptions of normality and equal variances.
Therefore, we will re-analyse the dataset by using the
non-parametric alternative to ANOVA, the `Kruskal-Wallis test`.
We will first give a concise overview of what we saw in the
ANOVA analysis, which can be found in the
`ANOVA_lettuce_plants.Rmd` file.
The researchers want to find out if biochar, compost and
a combination of both biochar and compost have an influence
on the growth of lettuce plants. To this end, they grew up
lettuce plants in a greenhouse. The pots were filled with
one of four soil types;
1. Soil only (control)
2. Soil supplemented with biochar (refoak)
3. Soil supplemented with compost (compost)
4. Soil supplemented with both biochar and compost (cobc)
The dataset `freshweight_lettuce.txt` contains the freshweight
(in grams) for 28 lettuce plants (7 per condition).
Load the required libraries
```{r, message = FALSE}
library(tidyverse)
```
# Data import
```{r}
lettuce <- read_csv("https://raw.githubusercontent.com/statOmics/PSLSData/main/freshweight_lettuce.txt")
```
Take a glimpse at the data
```{r}
glimpse(lettuce)
```
# Data tidying
```{r}
## set treatment to factor
## ...
```
# Data exploration
```{r}
## Count the number of observations per treatment
```
Now let's make a boxplot displaying the freshweight
of each treatment condition:
```{r}
# ...
```
Interpret the visualization!
In the analysis in chapter 7 (`ANOVA_lettuce_plants_half.rmd` file),
we accepted the assumptions for analyzing the data with an ANOVA.
However, it was not clear if all the assumptions of ANOVA were met.
Indeed, with only 7 values per group, it is very hard to assess
the assumptions of normality and equal variances.
Therefore, we will re-analyse the dataset by using the
non-parametric alternative to ANOVA: the Kruskal-Wallis test.
# Kruskal-Wallis rank test
## Hypotheses
Formulate a correct null and alternative hypothesis for the Kruskal-Wallis test in this analysis.
## Analysis
```{r}
# set.seed(1)
# kw <- kruskal_test(...)
# kw
```
Interpret the results!
# Post-hoc analysis
We will perform a post-hoc analysis with pairwise Wilcoxon rank
sum test. As we did not want to assume the location shift, we
will interpret the outcome in terms of probabilistic indices.
Note that after the analysis, we will need to correct the acquired
p-values for multiple testing.
## Hypotheses
Formulate a correct null and alternative hypothesis for the Wilcoxon test post-hoc analysis.
## Analysis
```{r}
## pairwise.wilcox.test(...)
```
What do you observe?
```
## Alternative: caluculate the p-value for each treatment combination with wilcoxon_test
treatments <- levels(lettuce$treatment)
freshweight <- lettuce$freshweight
pvalues <- combn(treatments,2,function(x){
## Pairwise Wilcoxon test
test = wilcox_test(freshweight~treatment,subset(lettuce,treatment%in%x), distribution = 'exact')
## Get and store p-value of test
pvalue(test)
})
## Adjust for multiple testing
pvalues_bonf = p.adjust(pvalues,method = 'bonferroni')
## link the p-value with the correct pairwise test
names(pvalues_bonf) <- combn(levels(lettuce$treatment),2,paste,collapse="_VS_")
pvalues_bonf
```
Interpret.
Based on the chunk of code above, can extract the point estimates
for the probabilistic indices? Interpret those as well.
# Conclusion
Formulate a proper conclusion that answers the research hypothesis.