-
Notifications
You must be signed in to change notification settings - Fork 82
ASQG Format
Jared Simpson edited this page Mar 5, 2014
·
12 revisions
The ASQG format describes an assembly graph. Each line is a tab-delimited record. The first field in each record describes the record type. The three types are:
-
HT
- Header record. This record contains metadata tags for the file version (VN tag) and parameters associated with the graph (for example the minimum overlap length). -
VT
- Vertex records. The second field contains the vertex identifier, the third field contains the sequence. Subsequent fields contain optional tags. -
ED
- Edge description records. The second field describes a pair of overlapping reads (see below). Subsequent fields contain optional tags.
Record tags follow the same format as SAM
HT VN:i:1 ER:f:0 OL:i:45 IN:Z:reads.fa CN:i:1 TE:i:0
VT read1 GATCGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGG
VT read2 CGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATA
VT read3 ATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATATT
ED read2 read1 0 46 50 3 49 50 0 0
ED read3 read2 0 47 50 2 49 50 0 0
The second field of ED records describe an overlap between a pair of sequences. This field contains 10 elements which are:
- sequence 1 name
- sequence 2 name
- sequence 1 overlap start (0 based)
- sequence 1 overlap end (inclusive)
- sequence 1 length
- sequence 2 overlap start (0 based)
- sequence 2 overlap end (inclusive)
- sequence 2 length
- sequence 2 orientation (1 for reverse with respect to sequence 1, 0 for forward)
- number of differences in overlap (0 for perfect overlaps, which is the default).