Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to align against a multi-contig construct #56

Open
jdidion opened this issue Sep 19, 2023 · 4 comments
Open

Ability to align against a multi-contig construct #56

jdidion opened this issue Sep 19, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@jdidion
Copy link
Collaborator

jdidion commented Sep 19, 2023

Often, multiple contigs in the reference fasta will belong to the same construct (such as a viral vector or plasmid). It would be nice if there were a way to describe the ordering of the component contigs in a construct, and annotate a construct as circular (so that jumps from the end of the last construct to the beginning of the first would have zero cost).

I can think of a number of ways to do this (comma-delimited list of contigs on the command line; custom format text or JSON file) but would prefer a standardized format if one exists. Maybe GFA?

@jdidion jdidion added the enhancement New feature or request label Sep 19, 2023
@nh13
Copy link
Member

nh13 commented Sep 19, 2023

Why not use the sequence dictionary (.dict) file? We already perform a lookup if it exists to find annotated circular contigs using the @SQ.TP tag.

I am tempted to do this by convention, for example, use the @SQ.AS tag to identify contigs in a linear chain (canonical plasmid/vector name) with the @SQ.TP tag set to linear for all but the last contig in the chain (then set to circular), but this feels against the intent of SAM.

There's no reason we cannot instead have lower case tags to denote this information. Thoughts?

@jdidion
Copy link
Collaborator Author

jdidion commented Sep 19, 2023

That could work.

I am okay with stipulating that at least one of the following must be true for entries in the dict with the same AS:

  1. They need to be in the same order they appear in the construct
  2. They all need to be annotated with a custom tag indicating their index within the construct

Circularity is a bit less clear. We could assume that if any contig in the construct has TP:circular then the entire construct is considered circular. Alternatively, we could introduce a custom tag to denote a circular construct, and/or provide an option to specify the AS of any constructs in the dict to be considered circular.

@nh13
Copy link
Member

nh13 commented Sep 19, 2023

We could also just have “current index” and “next index”?

@jdidion
Copy link
Collaborator Author

jdidion commented Sep 19, 2023

That works - takes care of both ordering and circularity.

I do think we should also have a default interpretation when there are multiple sequences with the same AS and there are no custom tags. I suggest to assume the contigs are ordered, and the construct is circular if any sequence is annotated as circular. Either that or show a warning that the sequences will be treated as independent despite having the same AS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants