MICB405-Metagenomics

A metagenome annotation workflow for UBC's Microbiology 405 course (Bioinformatics)

Outline

Data used for this project is from Saanich Inlet, a seasonally-anoxic fjord. These data are described further in the series of Scientific Data publications (geochemical data and multi-omic data). The goal of this workflow is to annotate metagenome-assembled genomes (MAGs) and reconstruct the nitrogen cycle through the water column.

To begin, metagenome sequencing reads were downloaded for samples from August 2013 (cruise 72):

SRR3719539
SRR3719544
SRR3719545
SRR3719563
SRR3719562
SRR3719654
SRR3719564

SRA-tools' fastq-dump --split-files --gzip was used to download each dataset from the Sequence Read archive (SRA). Due to time and resource limitations each FASTQ file was subsetted using the head command to 32 million reads, leaving 64 million reads for each sample. 32 million was selected since this was the number of reads contained for that sample with lowest coverage:

SAMN05224519,SRR3719564,32004368,SI072_LV_200m_DNA

Think of this subsetting as rarefaction.

Here is where the class takes over! The next step is to use MEGAHIT to assemble the genomes, MaxBin 2.0 to bin the genomes from metagenomes, checkM to identify the best MAGs, MASH to identify the closest genomic relative in RefSeq (i.e. assign taxonomy), and finally Prokka to annotate the MAGs.

Recommended usage for each software is provided at workflow.md.

The outline for the report is available at Evaluation.md

References

Torres-Beltrán, M., Hawley, A. K., Capelle, D., Zaikova, E., Walsh, D. A., Mueller, A., ... & Finke, J. (2017). A compendium of geochemical information from the Saanich Inlet water column. Scientific Data, 4, sdata2017159.
Hawley, A. K., Torres-Beltrán, M., Zaikova, E., Walsh, D. A., Mueller, A., Scofield, M., ... & Shevchuk, O. (2017). A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data, 4, sdata2017160.
Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674-1676.
Wu, Y. W., Simmons, B. A., & Singer, S. W. (2015). MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics, 32(4), 605-607.
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research, 25(7), 1043-1055.
Ondov, B. D., Treangen, T. J., Melsted, P., Mallonee, A. B., Bergman, N. H., Koren, S., & Phillippy, A. M. (2016). Mash: fast genome and metagenome distance estimation using MinHash. Genome biology, 17(1), 132.
Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), 2068-2069.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Evaluation.md		Evaluation.md
README.md		README.md
workflow.md		workflow.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MICB405-Metagenomics

Outline

References

About

Releases

Packages

nevetsmallah/MICB405-Metagenomics

Folders and files

Latest commit

History

Repository files navigation

MICB405-Metagenomics

Outline

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages