Scripts for annotating 10x Genomics scRNA-seq analysis data
This repository requires that pandoc
and libhdf5-dev
libraries are installed:
sudo apt-get install pandoc libhdf5-dev
It also depends on the H5MANIPULATOR
, jsonlite
, rmarkdown
, and optparse
libraries.
jsonlite
, rmarkdown
and optparse
are available from CRAN, and can be installed in R using:
install.packages("jsonlite")
install.packages("rmarkdown")
install.packages("optparse")
Sys.setenv(GITHUB_PAT = "[your_PAT_here]")
devtools::install_github("bwh-bioinformatics-hub/H5MANIPULATOR")
This repository can add important QC characteristics and cell metadata for 10x Genomics. It requires the filtered_feature_bc_matrix.h5
, molecule_info.h5
, and metrics_summary.csv
files generated by cellranger count
as inputs, as well as a SampleSheet.csv
file (as described below), and generates a decorated output .h5 file based on these parameters and a SampleID.
There are 5 parameters for this script:
-i or --in_h5
: The path to the filtered_feature_bc_matrix.h5 file from cellranger outs/-l or --in_mol
: The path to the molecule_info.h5 file from cellranger outs/-s or --in_sum
: The path to the metrics_summary.csv file from cellranger outs/-k or --in_key
: The path to SampleSheet.csv-j or --in_sample
: Sample Name
An example run for a cellranger count result is:
Rscript --vanilla \
tenx-rnaseq-pipeline/tenx_rna_metadata_update.R \
-i outs/filtered_feature_bc_matrix.h5 \
-l outs/molecule_info.h5 \
-s outs/metrics_summary.csv \
-k outs/SampleSheet.csv \
-j Sample_Name \
It should have 3 columns: SampleID, Type (Control, Treatment), and LibraryID
SampleID,Type,LibraryID
CRCI1,Saline,CRN00234043
CRCI2,Saline,CRN00234044
CRCI3,Chemotherapy,CRN00234045
Outputs, two files will be generated. The .h5 will be named based on Library and SampleID, while the JSON metrics for this well will be named based on Library:
- .h5 file: [SampleID]_[Type].h5, e.g. CRCI1.h5
- JSON file: [LibraryID]_sample_metrics.json, e.g. CRCI1-Saline_sample_metrics.json