Home

SGA is a de novo assembler designed to assemble large genomes from high coverage short read data. It is designed as a modular set of programs, which are used to form an assembly pipeline. A description of the SGA design is found here.

When first learning SGA, it is highly recommended to run one of the example assemblies from the src/examples directory to become familiar with the flow of data through the program. A page containing frequently asked questions can be found here.

The major subcommands of SGA are:

preprocess - Prepare a set of sequence reads for assembly
index - Build the FM index for a set of sequence reads
merge - Merge two indices together. This can be used to build a distributed indexing pipeline.
overlap - Find overlaps between reads to construct a string graph
correct - Correct base calling errors in a set of reads
rmdup - Remove duplicate sequences
qc - Remove possible chimeric reads
assemble - Construct contigs from a string graph

Detail usage information for each command is printed from the --help option. For example, this command will print the options for the index subprogram:

  sga index --help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally