Skip to content
Jared Simpson edited this page Mar 5, 2014 · 17 revisions

Frequently asked questions

  1. What depth of coverage do I need?

    At a minimum, 20-30X coverage is required to get a usable result from a de novo assembly of short reads. I usually request 40X.

  2. Can I assemble very short (<50bp) reads with SGA?

    This is not recommended. For very short reads a de Bruijn graph assembler will give as good or better results with lower run time. SGA is currently designed for 100bp or greater reads.

  3. The sga assemble/fm-merge steps are taking too much time/memory.

    The cause of this problem is usually selecting an overlap length that is too short. If the overlap length is very short, many edges will be created in the graph between reads that have short repeats at their ends. The suggested parameters for 100bp reads are overlap length 65 for fm-merge and overlap length 75 for sga assemble. You should try this parameter combination first, then tweak the overlap parameter in sga assemble to try to improve the assembly.

  4. I want to assemble a very large genome. How do I efficiently index the reads?

    See this page which describes the construction algorithm for large data sets.

  5. How do I build scaffolds using multiple libraries?

    See here

  6. What parameters should I tune to improve my assembly?

    See here

  7. Can I use bwa mem during scaffolding?

    Yes, but you need to remove secondary alignments from the bam file. See this mailing list thread.

  8. What is an ASQG file?

    See here

  9. Where can I get further help or advice?

    There is a mailing list/google group for sga. You can sign up [here] (http://groups.google.com/group/sga-users)