Skip to content

Tutorial

Livnat Jerby edited this page Aug 10, 2023 · 19 revisions

As input DIALOGUE gets single-cell profiles of different cell types (see tpm, below). For simplicity, we describe the pipeline when applied to single-cell transcriptomes, but it can be applied to map other cellular modalities (e.g., scATAC-seq, CITE-Seq, etc.).

It is usually recommended to also provide a more compact representation of the gene expression (X below); for example, the first Principle Components (PCs) based on the PCA of each cell type.

The input is provided as cell.type objects, each representing a specific cell type or subtype. You can generate these cell.type objects with

make.cell.type(name = "Cell.type.name",tpm,samples,X,metadata)

where name is the cell type name (e.g., "macrophage", "DC", etc.);

tpm (m x n) is the single-cell gene expression;

samples (n x 1) is the sample identity of each cell; if the data has spatial coordinates then each niche within the tissue is considered to be a different "sample", giving more statistical power to identify multicellular programs (MCPs);

X (n x k1) the "original feature space"; these can be PCs, NMF components, the gene expression matrix, or any other representation. It is recommended to use a representation with k1 << n, and perform the initial dimensionality reduction when using only cells of a specific type or subtype to adequately capture the variation within that specific subset.

metadata (n x k2) includes any other features of the cells that you might want to include as potential confounders or as biologically meaningful properties. DIALOGUE can then control for the former (see conf below) and examine the association of the resulting MCPs with the latter. DIALOGUE always controls for technical variability using the scaled log-transformed number of reads and/or genes detected, which you can also provide as a cellQ column on the metadata.

See ?DIALOGUE::make.cell.type for more information.

To run DIALOGUE, generate rA a list of the cell.type objects you want to include in the analysis (see example), decide how many MCPs you want to identify (k), and call:

> param <- DLG.get.param(k = 3, 
                         results.dir = "DLG.results/", 
                         conf = c("gender","sample.quality","cellQ"), # Confounding factors
                         pheno = "pathology") # Phenotype (optional)

> R<-DIALOGUE.run(rA = rA, # list of cell.type objects
                  main = "RunName",
                  param = param)
              

Note that DIALOGUE will always find the same MCPs or a subset of them, no matter which k is used.

If there are any potential confounders DIALOGUE needs to account for, specify them with conf. By default, it will just use the "cellQ" column.

pheno is an optional input parameter to denote a specific phenotype should be tested for association with the MCPs identified. In this case it should also be provided as a column in the metadata of each cell type. For example:

> head(rA$A@metadata[,-1])
                            cellQ gender location clinical.status cell.subtypes pathology
N7.EpiA.AAACGCACAATCGC 0.03746722 Female      Epi    Non-inflamed           TA2      TRUE
N7.EpiA.AGATATTGATCGGT 0.09329337 Female      Epi    Non-inflamed           TA2      TRUE
N7.EpiA.AGTCTACTTCTCTA 0.24291245 Female      Epi    Non-inflamed           TA2      TRUE
N7.EpiA.ATATACGAAGTACC 0.15548895 Female      Epi    Non-inflamed           TA2      TRUE
N7.EpiA.ATCTGTTGTCATTC 0.05832397 Female      Epi    Non-inflamed           TA2      TRUE
N7.EpiA.ATTCCAACTTTGGG 0.17234919 Female      Epi    Non-inflamed           TA2      TRUE

See ?DIALOGUE::DIALOGUE.run for more information.

Output

DIALOGUE will identify sets of co-regulated genes across the different cell types, which we call multicellular programs (MCPs). Each MCP consists of several cell-type-specific gene subsets.

The output R includes:

MCPs - the MCPs given as a list of gene sets;

scores - the MCPs' scores in each cell;

gene.pval - the cross-cell-type p-values of each gene in the MCP;

pref - the correlation (R) and association (mixed-effects p-value) between the cell-type-specific components of each MCP.

pheno - the association of each MCP with the phenotype of interest, given as direction times -log10(p-value).

DIALOGUE will also generate different plots as explained here.

Clone this wiki locally