Skip to content
Jared Simpson edited this page Mar 5, 2014 · 12 revisions

The ASQG format describes an assembly graph. Each line is a tab-delimited record. The first field describes the record type. The three record types are:

  1. HT - Header information. This record contains metadata tags for the file version (VN tag) and parameters associated with the graph (for instance the minimum overlap length).
  2. VT - Vertex records. These records contain the vertex identifier and its associated sequence.
  3. ED - Edge description records. The second field describes a pair of overlapping reads. Subsequent fields can have SAM-style tags containing additional data, like a CIGAR string.

Edge descriptions

The second field of ED records describe an overlap between a pair of reads. This field contains 10 elements which are:

  1. contig 1 name
  2. contig 2 name
  3. contig 1 overlap start (0 based)
  4. contig 1 overlap end (inclusive)
  5. contig 1 length
  6. contig 2 overlap start (0 based)
  7. contig 2 overlap end (inclusive)
  8. contig 2 length
  9. contig 2 is reverse (1 for reverse, 0 for forward)
  10. number of differences in overlap (0 for perfect overlaps, which is the default).

Example

HT VN:i:1 ER:f:0 OL:i:45 IN:Z:reads.fa CN:i:1 TE:i:0 VT read1 GATCGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGG SS:i:0 VT read2 CGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATA SS:i:0 VT read3 ATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATATT SS:i:0 ED read2 read1 0 46 50 3 49 50 0 0 ED read3 read2 0 47 50 2 49 50 0 0