Lightweight String Graph Construction
This software is in its initial stage of development. Please contact Marco Previtali for any questions or use the issue tracker for reporting bugs.
The software is composed of three different tools: lsg
(light (string) overlap graph), redbuild
(string graph build), and graph2asqg
(native graph format to ASQG).
To build all the tools simply move in the root directory of the project and run make all
.
If you want to try out the software on some (simulated or real) reads follow these steps:
- given a FASTA file
a.fa
containingn
reads produce another FASTA fileb.fa
containing2n
reads s.t. reads between position1
andn
are the same as those ina.fa
and reads between positionn+1
and2n
are their reverse and complement (for1 <= i <= n
, read in positionn+i
should be the reverse complement of read in positioni
) - download and compile BEETL (please note that this is not the original repository)
- compile LightStringGraph
cd <LSGPATH> && make all
- build the BWT of
b.fa
withbeetl-bwt -i b.fa -o <BWTPrefix> --output-format=ASCII --generate-lcp --generate-end-pos-file
- run LightStringGraph
<LSGPATH>/bin/lsg -B <BWTPrefix> -T <Tau> -C <CycNum>
where<Tau>
is the minimum overlap between reads and<CycNum> >= <reads length> - <Tau>
- run redbuild
<LSGPATH>/bin/redbuild -b <BWTPrefix> -r b.fa -m <CycNum>+1
- optionally run graph2asqg
<LSGPATH>/bin/graph2asqg -b <BWTPrefix> -r b.fa -l <readsLength>
and redirect STDOUT (the string graph in the ASQG format) to a file (you can compress it on the fly).
If lsg
crashes and produces a logic error try to raise the limit on the maximum number of open file descriptors for the user running that command (for example, with the bash
built-in ulimit -n
) and delete all the *.tmplsg.*
files before running lsg
again.