Skip to content
Jared Simpson edited this page Mar 5, 2014 · 12 revisions

The ASQG format describes an assembly graph. Each line is a tab-delimited record. The first field describes the record type. The three record types are:

  1. HT - Header information. This record contains metadata tags for the file version (VN tag) and parameters associated with the graph (for instance the minimum overlap length).
  2. VT - Vertex records. These records contain the vertex identifier and its associated sequence.
  3. ED - Edge description records. The second field describes a pair of overlapping reads. Subsequent fields can have SAM-style tags containing additional data, like a CIGAR string.

Edge descriptions

The second field of ED records describe an overlap between a pair of reads. This field contains 10 elements which are:

  1. contig 1 name
  2. contig 2 name
  3. contig 1 overlap start (0 based)
  4. contig 1 overlap end (inclusive)
  5. contig 1 length
  6. contig 2 overlap start (0 based)
  7. contig 2 overlap end (inclusive)
  8. contig 2 length
  9. contig 2 is reverse (1 for reverse, 0 for forward)
  10. number of differences in overlap (0 for perfect overlaps, which is the default).

Example

HT	VN:i:1	ER:f:0	OL:i:45	IN:Z:reads.fa	CN:i:1	TE:i:0
VT	read1	GATCGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGG	SS:i:0
VT	read2	CGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATA	SS:i:0
VT	read3	ATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATATT	SS:i:0
ED	read2 read1 0 46 50 3 49 50 0 0
ED	read3 read2 0 47 50 2 49 50 0 0