Skip to content
jts edited this page Sep 20, 2011 · 17 revisions

SGA is a de novo assembler designed to assemble large genomes from high coverage short read data. It is designed as a modular set of programs, which are used to form an assembly pipeline. A description of the SGA design is found here.

When first learning SGA, it is highly recommended to run one of the example assemblies from the src/examples directory to become familiar with the flow of data through the program. A page containing frequently asked questions can be found here.

The major subcommands of SGA are:

  • preprocess - Prepare a set of sequence reads for assembly
  • index - Build the FM index for a set of sequence reads
  • merge - Merge two indices together. This can be used to build a distributed indexing pipeline.
  • overlap - Find overlaps between reads to construct a string graph
  • fm-merge - Efficiently merge reads that can be unambiguously assembled
  • correct - Correct base calling errors in a set of reads
  • filter - Remove duplicate and low quality sequences
  • assemble - Construct contigs from a string graph

Detail usage information for each command is printed from the --help option. For example, this command will print the options for the index subprogram:

  sga index --help