Skip to content

5. Amplicon and shotgun metagenomics datasets

Inês Mendes edited this page Oct 19, 2019 · 1 revision

IN DEN-IM's pre-print we analysed two datasets to evaluate the DEN-IM workflow performance. One contained shotgun metagenomics sequencing data of patient samples and the other contained targeted metagenomics sequencing data obtained from Parameswaran et al.

Data availability

The 106 DENV-3 targeted metagenomics paired-end short-read datasets are available under BioProject PRJNA394021. The 25 shotgun metagenomics dataset is available under Bioproject PRJNA474413. The accession number for all the samples in the shotgun metagenomics dataset are available here.

Shotgun Metagenomics Dataset

Using the GetSeqENA tool, the raq sequencing data of the shotgun metagenomics dataset can be downloaded and stored in a folder named fastq/. The DEN-IM workflow was executed with the raw sequencing data using the default parameters and resources in an HPC cluster with 300 Cores/600 Threads of Processing Power and 3 TB RAM divided through 15 computational nodes, 9 with 254 GB Ram and 6 with 126GB RAM.

The workflow was cloned and run with the command nextflow run DEN-IM.nf -profile slurm_shifter. The resulting HTML is available here.

Targeted Metagenomics Dataset

Paired-end Dataset

The accession numbers for the 106 DENV-3 amplicon sequencing paired-end short-read datasets are available under BioProject PRJNA394021. The list of Run Accession IDs were obtained with NCBI’s RunSelector and the raw data was downloaded with the GetSeqENA tool. The DEN-IM workflow was executed with the raw sequencing data with default parameters and resources in the same HPC cluster as the shotgun metagenomics dataset.

The workflow was cloned and run with the command nextflow run DEN-IM.nf -profile slurm_shifter. The resulting HTML is available here.

Single-end Datast

The accession numbers for the 78 DENV-1 amplicon sequencing single-end short-read datasets are available under BioProject PRJNA321963. The list of Run Accession IDs were obtained with NCBI’s RunSelector and the raw data was downloaded with the GetSeqENA tool. The DEN-IM workflow was executed with the raw sequencing data with default parameters and resources in the same HPC cluster as the shotgun metagenomics dataset.

The workflow was cloned and run with the command nextflow run DEN-IM.nf -profile slurm_shifter. The resulting HTML is available here