-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
146 lines (114 loc) · 5.38 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
output: github_document
editor_options:
markdown:
wrap: 77
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/"
)
```
# sparrow <img src="man/figures/sparrow.png" height="150" align="right"/>
<!-- badges: start -->
[![R build
status](https://github.com/lianos/sparrow/workflows/R-CMD-check/badge.svg)](https://github.com/lianos/sparrow/actions)
![pkgdown](https://github.com/lianos/sparrow/workflows/pkgdown/badge.svg)
[![Project Status:
Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![Lifecycle:
stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://www.tidyverse.org/lifecycle/#stable)
[![codecov](https://codecov.io/gh/lianos/sparrow/branch/main/graph/badge.svg)](https://codecov.io/gh/lianos/sparrow)
<!-- badges: end -->
`sparrow` was built to facilitate the use of gene sets in the analysis of
high throughput genomics data (primarily RNA-seq). Analysts can orchestrate
any number of GSEA methods across a specific contrast using the unified
interface provided by the `seas`. A shiny application is provided via the
[sparrow.shiny](https://github.com/lianos/sparrow.shiny) package that enables
the interactive exploration of of GSEA results.
* The `sparrow::seas` function is a wrapper that orchestrates the execution
of any number of user-specified gene set enrichment analyses (GSEA) over
a particular experimental contrast of interest. This will create a
`SparrowResult` object which stores the results of each GSEA method
internally, allowing for easy query and retrieval.
* `{sparrow}` provides a number of convenience functions to retrieve gene
sets for different organisms as a
[`BiocSet`](https://bioconductor.org/packages/BiocSet) from a number of
popular resources, ie:
- `getMSigCollection()` to get gene sets from
[MSigDB](http://software.broadinstitute.org/gsea/msigdb/)
(using the `{msigdbr}`) package.
- `getKeggCollection()` to retrieve [KEGG](https://www.genome.jp/kegg/)
gene sets (using the KEGG query/retrieval functions in `{limma}`).
- `getPantherCollection()` to retrieve entreis from
[`PANTHER.db`](http://pantherdb.org)
* A sister [`{sparrow.shiny}`](https://github.com/lianos/sparrow.shiny)
package provides an `explore` function, which is invoked on
`SparrowResult` objects returned from a call to `seas()`. The shiny
application facilitates interactive exploration of these GSEA results.
This application can also be deployed to a shiny server and can be
initialized by uploading a serialized `SparrowResult` `*.rds` file.
Full details that outline the use of this software package is provided in the
[package's vignette](https://lianos.github.io/sparrow/articles/sparrow.html),
however a brief description is outlined below.
## Usage
A subset of the RNA-seq data tumor/normal samples in the BRCA indication from
the TCGA are provided in this package. We will use that data to perform a
"camera" and "fry" gene set enrichment analysis of tumor vs normal samples
using the MSigDB hallmark gene set collection with `sparrow::seas()`.
```{r example, message=FALSE, warning=FALSE}
library(sparrow)
library(dplyr)
bsc <- getMSigCollection("H", species = "human", id.type = "entrez")
vm <- exampleExpressionSet(dataset = "tumor-vs-normal", do.voom = TRUE)
mg <- seas(vm, bsc, methods = c("cameraPR", "fry"),
design = vm$design, contrast = "tumor")
```
We can view the top "camera" results with the smallest pvalues like so:
```{r}
results(mg, "cameraPR") %>%
arrange(pval) %>%
select(name, padj) %>%
head
```
The shift in expression of the genes within the top gene set can be
visualized with the `iplot` function below. This plot produces interactive
graphics, but rasterized versions are saved for use with this `README` file:
```{r, eval = FALSE}
iplot(mg, 'HALLMARK_MYC_TARGETS_V1', type = "density")
```
<img src="man/figures/README_iplot_density.png"/>
```{r, eval = FALSE}
iplot(mg, 'HALLMARK_MYC_TARGETS_V1', type = "gsea")
```
<img src="man/figures/README_iplot_gsea.png"/>
When these plots are rendered in your workspace or an Rmarkdown document, the
user can hover of the genes (dots) to see their name and differential
expression statistics.
For an immersive, interactive way to explore the GSEA results, use the
`sparrow.shiny::explore(mg)` method!
## Installation
This is the development version of the R/bioconductor package `{sparrow}`. It
may contain unstable or untested new features. If you are looking for the
release version of this package please go to its official Bioconductor
landing page and follow the instructions there to install it.
You can install this development version using the `{BiocManager}` CRAN
package:
```{r eval=FALSE}
BiocManager::install("sparrow", version = "devel")
```
Alternatively, you can install it from GitHub using the `{remotes}` package.
```{r eval=FALSE}
remotes::install_github("lianos/sparrow")
```
To install the shiny bits for this package, you can install the
`{sparrow.shiny}` in a similar way as described above.
## Historical Note
This package used to be called
[multiGSEA](https://github.com/lianos/multiGSEA)), but it's name was changed
to avoid conflict with another package by the same name that was submitted to
Bioconductor version 3.12. References to `multiGSEA` in the literature
prior to ~2020 (of which I know of at least one) likely refer to this here
package.