Screen L1000 dataset for drug candidates.
After getting RNA-seq data, first quantify them with Salmon. You will need a FASTA file containing your reference transcripts and a (set of) FASTA/FASTQ file(s) containing your reads to run Salmon.
In R(DE_Deseq_AD.r),install"tximeta" and "DESeq2". Import the quantified data from Salmon and process them into differential expression profiles. See guide to Deseq2 for more information.
Top differential expressed genes can then be used in next step as disease signatures.
The Library of Integrated Cellular Signatures (LINCS) is an NIH program which funds the generation of perturbational profiles across multiple cell and perturbation types, as well as read-outs, at a massive scale. We build a pipeline, in parallel with L1000 group, to process raw fluorescent intensity data into z-scores as perturbagen signatures, available at L1000-bayesian. Our Level 4 and Level 5 data are equivalent to Level 4 and Level 5 data provided by L1000. Pre-computed datasets covering a majority of LINCS L1000 Phase I and Phase II is available in Downloads and Zenodo.
Bayesian L1000 data
Description | Download |
---|---|
Plate control z-scores | Bayesian_GSE70138_Level4_ZSPC_n335465x978.h5 Bayesian_GSE92742_Level4_ZSPC_n1093191x978.h5 |
Combined z-scores by bio-replicates | Bayesian_GSE70138_Level5_COMPZ_n116218x978.h5 Bayesian_GSE92742_Level5_COMPZ_n361481x978.h5 |
Meta data
Description | Download |
---|---|
Signature IDs | GSE70138_Broad_LINCS_sig_info_2017-03-06.txt.gz GSE92742_Broad_LINCS_sig_info.txt.gz |
Perturbagen information | GSE70138_Broad_LINCS_pert_info_2017-03-06.txt.gz GSE92742_Broad_LINCS_pert_info.txt.gz |
Cell information | GSE70138_Broad_LINCS_cell_info_2017-04-28.txt.gz GSE92742_Broad_LINCS_cell_info.txt.gz |
Gene information | GSE70138_Broad_LINCS_gene_info_2017-03-06.txt.gz GSE92742_Broad_LINCS_gene_info.txt.gz |
Full meta data are available from the publication by L1000 group: Phase I GSE92742 and Phase IIGSE70138. They include perturbagen and cell line information associated with signature and instance IDs in the datasets.For more informationabout LINCS L1000 data, see Connectopedia
The z-score results (as HDF5) are compatible with those published by L1000 group. Each of them contains three datasets as follows:
-
/colid
are the signature IDs (Level 5) or instance IDs (Level 4); -
/rowid
are the names of landmark genes; -
/data
are the z-scores as a matrix.
- Download L1000 data. Download the following files in folder L1000_data : Bayesian_GSE70138_Level5_COMPZ_n116218x978.h5, Bayesian_GSE92742_Level5_COMPZ_n361481x978.h5, GSE92742_Broad_LINCS_sig_info.txt.gz, GSE70138_Broad_LINCS_sig_info_2017-03-06.txt.gz. Then unzip xxx_sig_info.txt.gz files.
- Prepare up and down regulated genes. The up and down regulated genes produced by Deseq2 or other pilelines. Save them in up_genes.csv and down_genes.csv seperately. Note that the genes needs to be converted to official Gene Symbol. You can use a converter to do this.
Use our pipeline (L1000_repurposing.ipynb) to search for drug candidates.
Or in python(calc_ES.py), use the up and down regulated gene as disease signature to calculate Enrichment Score(ES) against L1000 profiles. The drugs with lowest negative Enirchment Scores can be used as candidates to reverse the disease state.
The screening results will contain a long list of drugs. You can further check their experiment information, target or structure to find the best candidate.
Oliveros, Giovanni, et al. "Multi-scale predictive modeling discovers Ibudilast as a polypharmacological agent to improve hippocampal-dependent spatial learning and memory and mitigate plaque and tangle pathology in a transgenic rat model of Alzheimer’s disease." bioRxiv (2021).
Qiu, Yue, et al. "A Bayesian approach to accurate and robust signature detection on LINCS L1000 data." Bioinformatics 36.9 (2020): 2787-2795.