qtl_norm_qc

This repo contains necessary scripts to run Quality Control (QC) and Normalisation on phenotype count matrices generated by rnaseq pipeline.

qtl_norm_qc
- QC methods
Running the QC
- Mandatory QC parameters
- Optional QC Parameters
Running the Normalisation
- Mandatory QC parameters
- Optional QC Parameters
Using Software Container to perform QC and Normalisation
- Using Docker Container
  - Using the ready-to-use container (DockerHub)
  - Executing the script with Docker container
- Using Singularity Container

QC methods

To detect outliers we used PCA and MDS. And for sample mislabeling we used Sex-specific gene expression analysis

PCA

PCA is a linear dimension reduction method which aims to collect most of the variance in multidimensional dataset inside the principal components. As a result, it becomes possible to plot most of the variation and see if there are any samples in the dataset that look like obvious outliers. PCA is one of the most commonly used procedures to summarize the multivariate dataset and detect outliers in sample population.

MDS

MDS is an exploratory technique used to identify unrecognized dimensions of the dataset (Mugavin, 2008). MDS reduces multidimensional dataset to relatively simple, easy-to-visualize structures, which helps us to identify outliers after plotting and analysing it. On contrast to PCA, MDS is a non-linear dimension reduction using distances between each pair of samples, and forces all of the data into less number of dimensions. We explored MDS outliers of phenotype count matrices by performing hierarchical clustering. TPM (Wagner et al. 2012) values were used in log2-transformed (log2(0.1 + TPM)) scale. Pearson was used as the correlation measure and distance between samples were defined as distance = 1 - correlation.

We used isoMDS function from MASS R package (Cox and Cox 2000; Ripley 2007; Vernables and Ripley 2002) with two desired dimensions (k=2) to summarize the data into.

Sex-specific Gene Expression

We generate a scatter plot with XIST gene (ENSG00000229807 - found only in females) expression in horizontal axis and Y chromosome gene expression (found only in males) in vertical axis, and set the color of each sample according to its donor’s sex.

MBV

MBV (Match Bam to VCF) is a quality control method to find matches of aligned samples reads (BAMs) to the genotype samples in VCF file. The script generates the best-matches as a tab separated table and scatter plot for each sample.

Running the QC

To run the featurecounts_qc this github repository should be cloned (downloaded) into the local machine and navigated into the cloned folder:

git clone https://github.com/kerimoff/qtl_norm_qc.git
cd qtl_norm_qc

normaliseCountMatrix.R script accepts the following parameters

Mandatory QC parameters

`--count_matrix` or `-c`

Counts matrix file path. Tab separated file

`--sample_meta` or `-s`

Sample metadata file. Tab separated file

`--phenotype_meta` or `-p`

Phenotype metadata file. Tab separated file

Optional QC Parameters

`--quant_method` or `-q`

Quantification method. Possible values: gene_counts, leafcutter, txrevise, transcript_usage and exon_counts

Default Value: gene_counts

`--outdir` or `-o`

Path to the output directory

Default Value: ./normalised_results/

`--name_of_study` or `-n`

Custom name of the study. Optional . The study name by default will be extracted from sample metadata file. Will be overwritten with this parameter if provided

`--build_html`

Flag to build plotly html plots Default Value: FALSE

`--mbvdir` or `-m`

Path to the location where MBV quantification files are. Optional

Example QC running script can be found in here

Running the Normalisation

Mandatory QC parameters

`--count_matrix` or `-c`

Counts matrix file path. Tab separated file

`--sample_meta` or `-s`

Sample metadata file. Tab separated file

`--phenotype_meta` or `-p`

Phenotype metadata file. Tab separated file

Optional QC Parameters

`--quant_method` or `-q`

Quantification method. Possible values: gene_counts, leafcutter, txrevise, transcript_usage and exon_counts

Default Value: gene_counts

`--outdir` or `-o`

Path to the output directory

Default Value: ./normalised_results/

`--name_of_study` or `-n`

Custom name of the study. Optional . The study name by default will be extracted from sample metadata file. Will be overwritten with this parameter if provided

Example Normalisation running script can be found in here

Using Software Container to perform QC and Normalisation

All the software needed is containerised into the Docker container and pushed into DockerHub

Using Docker Container
Using Singularity Container

Using Docker Container

All required dependencies are built into the Docker container.

Using the ready-to-use container (DockerHub)

To use the pre-built container located in DockerHub no additional steps required. When the container with kerimoff/eqtlutils tag is run docker checks the existing images in local computer and if it does not exist it automatically tries to pull it from DockerHub.

Executing the script with Docker container

To execute the script we should first run the container.

docker run -idt -v "$(pwd)":/work_dir -w /work_dir --name qtl_norm_qc_cont kerimoff/eqtlutils bash

This will start our container (with qtl_norm_qc_cont name) in detached mode and mount current directory qtl_norm_qc to /work_dir directory of running container.

To check if the your container's status you can run

docker ps -a

You will see that there is a running container with the qtl_norm_qc_cont name (usually in first row)

To execute the normalisation or QC just execute the bash script you created with bash command

docker exec -it fc_qc_container bash run_fc_qc.sh

Using Singularity Container

It is straight forward to run the scripts with singularity container.

singularity exec -B /path/in/host/:/path/in/container/ docker://kerimoff/eqtlutils bash run_fc_qc.sh

Running this command will automatically download the kerimoff/eqtlutils from DockerHub and run the run_fc_qc.sh script.

-B flag is Binding path in your host computer to path inside the container. So, be sure that your data to be processed is reachable by container.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
QC		QC
normalisation		normalisation
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_container.sh		build_container.sh
makeLeafcutterMetadata.R		makeLeafcutterMetadata.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qtl_norm_qc

QC methods

PCA

MDS

Sex-specific Gene Expression

MBV

Running the QC

Mandatory QC parameters

`--count_matrix` or `-c`

`--sample_meta` or `-s`

`--phenotype_meta` or `-p`

Optional QC Parameters

`--quant_method` or `-q`

`--outdir` or `-o`

`--name_of_study` or `-n`

`--build_html`

`--mbvdir` or `-m`

Running the Normalisation

Mandatory QC parameters

`--count_matrix` or `-c`

`--sample_meta` or `-s`

`--phenotype_meta` or `-p`

Optional QC Parameters

`--quant_method` or `-q`

`--outdir` or `-o`

`--name_of_study` or `-n`

Using Software Container to perform QC and Normalisation

Using Docker Container

Using the ready-to-use container (DockerHub)

Executing the script with Docker container

Using Singularity Container

About

Releases

Packages

Contributors 2

Languages

License

eQTL-Catalogue/qtl_norm_qc

Folders and files

Latest commit

History

Repository files navigation

qtl_norm_qc

QC methods

PCA

MDS

Sex-specific Gene Expression

MBV

Running the QC

Mandatory QC parameters

--count_matrix or -c

--sample_meta or -s

--phenotype_meta or -p

Optional QC Parameters

--quant_method or -q

--outdir or -o

--name_of_study or -n

--build_html

--mbvdir or -m

Running the Normalisation

Mandatory QC parameters

--count_matrix or -c

--sample_meta or -s

--phenotype_meta or -p

Optional QC Parameters

--quant_method or -q

--outdir or -o

--name_of_study or -n

Using Software Container to perform QC and Normalisation

Using Docker Container

Using the ready-to-use container (DockerHub)

Executing the script with Docker container

Using Singularity Container

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`--count_matrix` or `-c`

`--sample_meta` or `-s`

`--phenotype_meta` or `-p`

`--quant_method` or `-q`

`--outdir` or `-o`

`--name_of_study` or `-n`

`--build_html`

`--mbvdir` or `-m`

`--count_matrix` or `-c`

`--sample_meta` or `-s`

`--phenotype_meta` or `-p`

`--quant_method` or `-q`

`--outdir` or `-o`

`--name_of_study` or `-n`

Packages