Skip to content

BioMeDS/metabarcoding_pipeline

 
 

Repository files navigation

Metabarcoding processing pipeline

by Alexander Keller (LMU Munich)

A simple script to process metabarcoding (e.g. 16S V4) data, with amplicons generated by

  • 16S: Kozich et al. 2013 AEM
  • ITS2: Sickel et al. 2015 BMC Ecology

If you use this script, please kindly cite this article: https://doi.org/10.1098/rstb.2021.0171

Dependencies

What will the script do?

  • Un-gzipping files
  • Individual sample preparation
    • Merging forward and reverse reads
    • Quality filtering
    • Backup Option: Forward read only use in case of bad quality reverse reads
  • Community level processing
    • Dereplication
    • Denoising
    • ASV generation
    • Chimera (de novo) removal
    • Taxonomic classification
      • allows for multiple reference databases (iterative) with decreasing priority
      • all unclassified reads are hierarchically classified
    • Creation of a community table

Usage:

  1. Put all your raw sequencing files (.fastq or .fastq.gz) into a subfolder of where this script is (do not use full paths).

  2. Copy a config.txt from the resources folder, adapt it to your needs, and copy it into your data folder. Consier to check paths to binaries in the script file

  3. You also need to add a config.txt file, where information about databases are stored. An example is in the example directory.

Then you are ready to run:

bash _processing_MB_0.2a.sh <FOLDER>

Results will be in a new subfolder of your current directory called <FOLDER>.<DATE>

In case the analysis needs to be reverted, which will remove files and bring the folder structure back to the original state.

bash _revert_analysis_1.sh <FOLDER>

Import into R

In the <FOLDER>.<DATE> folder, there will be an R script for data import and basic ecological analyses.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 50.0%
  • Shell 30.2%
  • Python 19.8%