-
Notifications
You must be signed in to change notification settings - Fork 16
Tutorial
As input DIALOGUE gets single-cell profiles of different cell types (see tpm
, below). For simplicity, we describe the pipeline when applied to single-cell transcriptomes, but it can be applied to map other cellular modalities (e.g., scATAC-seq, CITE-Seq, etc.).
It is usually recommended to also provide a more compact representation of the gene expression (X
below); for example, the first Principle Components (PCs) based on the PCA of each cell type.
The input is provided as cell.type
objects, each representing a specific cell type or subtype. You can generate these cell.type
objects with
make.cell.type(name = "Cell.type.name",tpm,samples,X,metadata)
where name
is the cell type name (e.g., "macrophage", "DC", etc.);
tpm
(m x n) is the single-cell gene expression;
samples
(n x 1) is the sample identity of each cell; if the data has spatial coordinates then each niche within the tissue is considered to be a different "sample", giving more statistical power to identify multicellular programs (MCPs);
X
(n x k1) the "original feature space"; these can be PCs, NMF components, the gene expression matrix, or any other representation. It is recommended to use a representation with k1 << n
, and perform the initial dimensionality reduction when using only cells of a specific type or subtype to adequately capture the variation within that specific subset.
metadata
(n x k2) includes any other features of the cells that you might want to include as potential confounders or as biologically meaningful properties. DIALOGUE can then control for the former (see conf
below) and examine the association of the resulting MCPs with the latter. DIALOGUE always controls for technical variability using the scaled log-transformed number of reads and/or genes detected, which you can also provide as a cellQ
column on the metadata
.
See ?DIALOGUE::make.cell.type
for more information.
To run DIALOGUE, generate rA
a list of the cell.type
objects you want to include in the analysis (see example), decide how many MCPs you want to identify (k
), and call:
> param <- DLG.get.param(k = 3,
results.dir = "DLG.results/",
conf = c("gender","sample.quality","cellQ"), # Confounding factors
pheno = "pathology") # Phenotype (optional)
> R<-DIALOGUE.run(rA = rA, # list of cell.type objects
main = "RunName",
param = param)
Note that DIALOGUE will always find the same MCPs or a subset of them, no matter which k
is used.
If there are any potential confounders DIALOGUE needs to account for, specify them with conf
. By default, it will just use the "cellQ" column.
pheno
is an optional input parameter to denote a specific phenotype should be tested for association with the MCPs identified. In this case it should also be provided as a column in the metadata of each cell type. For example:
> head(rA$A@metadata[,-1])
cellQ gender location clinical.status cell.subtypes pathology
N7.EpiA.AAACGCACAATCGC 0.03746722 Female Epi Non-inflamed TA2 TRUE
N7.EpiA.AGATATTGATCGGT 0.09329337 Female Epi Non-inflamed TA2 TRUE
N7.EpiA.AGTCTACTTCTCTA 0.24291245 Female Epi Non-inflamed TA2 TRUE
N7.EpiA.ATATACGAAGTACC 0.15548895 Female Epi Non-inflamed TA2 TRUE
N7.EpiA.ATCTGTTGTCATTC 0.05832397 Female Epi Non-inflamed TA2 TRUE
N7.EpiA.ATTCCAACTTTGGG 0.17234919 Female Epi Non-inflamed TA2 TRUE
See ?DIALOGUE::DIALOGUE.run
for more information.
DIALOGUE will identify sets of co-regulated genes across the different cell types, which we call multicellular programs (MCPs). Each MCP consists of several cell-type-specific gene subsets.
The output R
includes:
MCPs
- the MCPs given as a list of gene sets;
scores
- the MCPs' scores in each cell;
gene.pval
- the cross-cell-type p-values of each gene in the MCP;
pref
- the correlation (R) and association (mixed-effects p-value) between the cell-type-specific components of each MCP.
pheno
- the association of each MCP with the phenotype of interest, given as direction times -log10(p-value).
DIALOGUE will also generate different plots as explained here.