-
Notifications
You must be signed in to change notification settings - Fork 82
ASQG Format
Jared Simpson edited this page Mar 5, 2014
·
12 revisions
The ASQG format describes an assembly graph. Each line is a tab-delimited record. The first field describes the record type. The three record types are:
- HT - Header information. This record contains metadata tags for the file version (VN tag) and parameters associated with the graph (for instance the minimum overlap length).
- VT - Vertex records. These records contain the vertex identifier and its associated sequence.
- ED - Edge description records. The second field describes a pair of overlapping reads. Subsequent fields can have SAM-style tags containing additional data, like a CIGAR string.
The second field of ED records describe an overlap between a pair of reads. This field contains 10 elements which are:
- contig 1 name
- contig 2 name
- contig 1 overlap start (0 based)
- contig 1 overlap end (inclusive)
- contig 1 length
- contig 2 overlap start (0 based)
- contig 2 overlap end (inclusive)
- contig 2 length
- contig 2 is reverse (1 for reverse, 0 for forward)
- number of differences in overlap (0 for perfect overlaps, which is the default).
HT VN:i:1 ER:f:0 OL:i:45 IN:Z:reads.fa CN:i:1 TE:i:0 VT read1 GATCGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGG SS:i:0 VT read2 CGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATA SS:i:0 VT read3 ATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATATT SS:i:0 ED read2 read1 0 46 50 3 49 50 0 0 ED read3 read2 0 47 50 2 49 50 0 0