Skip to content

Indexing large data sets

jts edited this page Aug 20, 2012 · 1 revision

SGA version 0.9.31 and later contain a very efficient algorithm to index large amounts of short reads. The code was written by Heng Li (https://github.com/lh3/ropebwt). To use this algorithm, specify the -a ropebwt option to sga index. This algorithm can index 1.5 billion 100bp reads in under 64GB of memory. You should be able to use this to index all of your data in a single process:

sga preprocess *.fastq > all.fastq
sga index -a ropebwt [--no-reverse] -t 4 all.fastq