Skip to content
Jared Simpson edited this page Dec 4, 2013 · 9 revisions

sga comes with a quality control and data exploration module. This module will estimate sequence coverage, per-base error rates and genome size, heterozygosity and repeat content. It is highly recommended to run this module on your data to better understand how difficult the assembly will be. Once you have produced the preqc PDF report, feel free to share it on the sga-users mailing list and ask for advice on how to best proceed with the assembly.

A full description can be found in this announcement post on the sga mailing list:

https://groups.google.com/forum/#!msg/sga-users/95dTwpJCARU/oKoq54EZqKwJ

A preprint of the preqc manuscript is available on the arxiv:

http://arxiv.org/abs/1307.8026

To generate a preqc report for your data, run these four commands:

sga preprocess --pe-mode 1 reads_R1.fastq reads_R2.fastq > mygenome.fastq
sga index -a ropebwt --no-reverse -t 8 mygenome.fastq
sga preqc -t 8 mygenome.fastq > mygenome.preqc
sga-preqc-report.py mygenome.preqc sga/src/examples/*.preqc