Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
schellt committed Oct 14, 2019
1 parent 05672a3 commit 151f1de
Showing 1 changed file with 47 additions and 50 deletions.
97 changes: 47 additions & 50 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,17 @@ Waldvogel A-M, Schell T
## Description
__Trimming (overrepresented *k*-mers from) multiple fastq files.__

After each trimming a test for overrepresented *k*-mers will be executed, if not switched off. Any overrepresented *k*-mers found will be added to a global list and the trimming will be repeated on the original input files together with the overrepesented *k*-mers until no overrepresentation is detected.

Autotrim uses Trimmomatic for trimming, FastQC for overrepresentaion screening and MultiQC to generate summary reports.
The dependencies will be automatically detected if `java`, `trimmomatic`, `fastqc` or `multiqc` is in your `$PATH`. Execution of MultiQC is optionally and will be skipped if not found in your `$PATH` and `-mqcp` is not specified.

After each trimming a test for overrepresented *k*-mers will be executed, if not switched off. Any overrepresented *k*-mers found will be added to a global list and the trimming will be repeated on the original input files together with the overrepesented *k*-mers until no overrepresentation is detected.
Autotrim uses Trimmomatic for trimming, FastQC for overrepresentaion screening and MultiQC to generate summary reports.
The dependencies will be automatically detected if `java`, `trimmomatic`, `fastqc` or `multiqc` is in your `$PATH`. Execution of MultiQC is optionally and will be skipped if not found in your `$PATH` and `-mqcp` is not specified.
Autotrim distinguishes automatically between paired and single end data.

## Dependencies

- Trimmomatic: [http://www.usadellab.org/cms/?page=trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic)
- FastQC: [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic)
- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

__Optional__
- MultiQC: [http://multiqc.info/](http://multiqc.info/)
- [MultiQC](http://multiqc.info/)

## Usage
### Prerequisites
Expand All @@ -37,49 +34,46 @@ __While choosing a root direcory will skip the subdirectory when it contains mor

### Output
Fastq output files are created in the same directory as the corrensponding input file.
The `.f(ast)q(.gz)` file extension will be removed and `_autotrim.fq` for single end data and `_autotrim.paired.fq` and `_autotrim.unpaired.fq` will be added to the end of the file name respectively.

FastQC reports will be created for single end input data and the paired output fastq files for paired end input (not for the unpaired output).

Standard out `*.trimmomatic_log` and standard error `*.trimmomatic_err` from each Trimmomatic run will be saved in the same folder as the input.

The `.f(ast)q(.gz)` file extension will be removed and `_autotrim.fq` for single end data and `_autotrim.paired.fq` and `_autotrim.unpaired.fq` will be added to the end of the file name respectively.
FastQC reports will be created for single end input data and the paired output fastq files for paired end input (not for the unpaired output).
Standard out `*.trimmomatic_log` and standard error `*.trimmomatic_err` from each Trimmomatic run will be saved in the same folder as the input.
The MultiQC report will be placed either in the root directory if `-d` is used or in the directory of the log and overrepresented *k*-mers files if `-fofn` is used.

```
autotrim.pl [-d <root_dir> | -fofn <file_of_file_names> -log <dir_to_place_the_log-file>]
-to <trimmomatic_options> -trim <trimmomatic_trimmer>
```
__Options: [default]__

```
-d STR Root directory for automatic input of multiple data sets. The file containing overrepresented
k-mers (kmer.fa) and the autotrim log file (autotrim.log) will be saved in this directory.
-fofn STR File of file names containing tab-seperated paths to one data set per line.
-log STR Specify the path to save the file containing overrepresented k-mers (kmer.fa) and the log
file of autotrim (autotrim.log) if using -fofn.
-to STR A file containing Trimmomatic options that should be used. All options need to be in the
first line of the file.
Create a trimlog for every data set writing "-trimlog" without a path, it will be saved in
the same folder as the single or "_1" input file.
-tt INT Trimmomatic threads. Specify either within -to or with -tt. [1]
-trim STR A file containing Trimmomatic trimmer in the first line in the particular order they should be
executed.
If trimming overrepresented k-mers, the "ILLUMINACLIP" will be inserted after the last
specified "ILLUMINACLIP".
-k INT K-mer length to screen for overrepresentaion with FastQC between 2 and 10. [7].
-nok No overrepresentation screening of k-mers. Trim each set once and create a FastQC report.
Recommended for RNA-seq. [off]
-v Verbose. Print executed commands of Trimmomatic, FastQC and MultiQC to STDOUT and log file.
[off]
-tp STR Trimmomatic path. The whole path to the Trimmomatic jar file. Specify if "trimmomatic" is not
in your $PATH.
-fqcp STR FastQC path. The whole path to the FastQC executable. Specify if "fastqc" is not in your
$PATH.
-mqcp STR MultiQC path. The whole path to the MultiQC executable. Specify if "multiqc" is not in your
$PATH and you want to automatically execute MultiQC.
-rn Rename files according the folder they are placed in. [off]
-version Print version number and exit.
-h or -help Print this help and exit.
Options: [default]
-d STR Root directory for automatic input of multiple data sets. The file containing
overrepresented k-mers (kmer.fa) and the autotrim log file (autotrim.log) will be
saved in this directory.
-fofn STR File of file names containing tab-seperated paths to one data set per line.
-log STR Specify the path to save the file containing overrepresented k-mers (kmer.fa) and
the log file of autotrim (autotrim.log) if using -fofn.
-to STR A file containing Trimmomatic options that should be used. All options need to be
in the first line of the file.
Create a trimlog for every data set writing "-trimlog" without a path, it will be
saved in the same folder as the single or "_1" input file.
-tt INT Trimmomatic threads. Specify either within -to or with -tt. [1]
-trim STR A file containing Trimmomatic trimmer in the first line in the particular order
they should be executed.
If trimming overrepresented k-mers, the "ILLUMINACLIP" will be inserted after the
last specified "ILLUMINACLIP".
-k INT K-mer length to screen for overrepresentaion with FastQC between 2 and 10. [7].
-nok No overrepresentation screening of k-mers. Trim each set once and create a FastQC
report.
Recommended for RNA-seq. [off]
-v Verbose. Print executed commands of Trimmomatic, FastQC and MultiQC to STDOUT and
log file. [off]
-tp STR Trimmomatic path. The whole path to the Trimmomatic jar file. Specify if
"trimmomatic" is not in your $PATH.
-fqcp STR FastQC path. The whole path to the FastQC executable. Specify if "fastqc" is not
in your $PATH.
-mqcp STR MultiQC path. The whole path to the MultiQC executable. Specify if "multiqc" is
not in your $PATH and you want MultiQC automatically executed.
-rn Rename files according the folder they are placed in. [off]
-version Print version number and exit.
-h or -help Print this help and exit.
```

## Citation
Expand All @@ -89,6 +83,9 @@ If you use this tool please cite:

Additional to the dependencies:

- Bolger AM, Lohse M, Usadel B (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15):2114–2120. [https://doi.org/10.1093/bioinformatics/btu170](https://doi.org/10.1093/bioinformatics/btu170)
- Andrews S (2010). FastQC: a quality control tool for high throughput sequence data. [http://www.bioinformatics.babraham.ac.uk/projects/fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc).
- Ewels P, Magnusson M, Lundin S, Käller M (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. *Bioinformatics*, 32(19):3047–3048. [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354)
- Trimmomatic
Bolger AM, Lohse M, Usadel B (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15):2114–2120. [https://doi.org/10.1093/bioinformatics/btu170](https://doi.org/10.1093/bioinformatics/btu170)
- FastQC
Andrews S (2010). FastQC: a quality control tool for high throughput sequence data. [http://www.bioinformatics.babraham.ac.uk/projects/fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc).
- MultiQC
Ewels P, Magnusson M, Lundin S, Käller M (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. *Bioinformatics*, 32(19):3047–3048. [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354)

0 comments on commit 151f1de

Please sign in to comment.