-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add cigar utility method get_alignment_offsets #188
base: main
Are you sure you want to change the base?
Changes from 4 commits
961990a
ddf1ef5
4fa85d6
f371b6c
e041787
b07c3e4
c334b43
6124684
5c8a76a
b535906
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -368,7 +368,7 @@ class CigarOp(enum.Enum): | |||||
code (int): The `~pysam` cigar operator code. | ||||||
character (int): The single character cigar operator. | ||||||
consumes_query (bool): True if this operator consumes query bases, False otherwise. | ||||||
consumes_target (bool): True if this operator consumes target bases, False otherwise. | ||||||
consumes_reference (bool): True if this operator consumes reference bases, False otherwise. | ||||||
""" | ||||||
|
||||||
M = (0, "M", True, True) #: Match or Mismatch the reference | ||||||
|
@@ -547,6 +547,48 @@ def length_on_target(self) -> int: | |||||
"""Returns the length of the alignment on the target sequence.""" | ||||||
return sum([elem.length_on_target for elem in self.elements]) | ||||||
|
||||||
def get_alignment_offsets(self, reverse: bool = False) -> Tuple[int, int]: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name is quite ambiguous given that we're dealing with an alignment between a query and a target. How about one of the following:
Also: consider returning a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you mean a python range ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. neat idea! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Functions prefixed with Can we drop the prefix too?
Suggested change
https://google.github.io/styleguide/pyguide.html#315-getters-and-setters |
||||||
""" | ||||||
Get the starting and ending offsets for the alignment based on the CIGAR string. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The whole docstring needs to be out-dented to start at the same column as the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this and mkdocs responds: 🤮
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Args: | ||||||
reverse: If True, count from the end of the read sequence. | ||||||
Otherwise (default behavior), count from the beginning of the read sequence. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unclear to me - does it just count from the end? Or does it also swap the order of "start" vs. "end"? I think you're saying that:
? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. |
||||||
|
||||||
Returns: | ||||||
A tuple (start, end), defining the start and end offsets of the aligned part | ||||||
of the read. These offsets are 0-based and open-ended, with respect to the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
beginning of the read sequence. (If 'reverse' is True, the offsets are with | ||||||
respect to the end of the read sequence.) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
If the Cigar contains no alignment operators that consume sequence bases, or | ||||||
only clipping operators, the start and end offsets will be the same value | ||||||
(indicating an empty region). This shared value will be the offset to the first | ||||||
base consumed by a non-clipping operator or the length of the read sequence if | ||||||
there is no such base. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that either a) this function should return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd argue in favor of |
||||||
|
||||||
# | ||||||
""" # TODO: figure out how to remove the '#' from the documentation above without | ||||||
# breaking the build | ||||||
start_offset: int = 0 | ||||||
end_offset: int = 0 | ||||||
element: CigarElement | ||||||
alignment_began = False | ||||||
elements = self.elements if not reverse else reversed(self.elements) | ||||||
for element in elements: | ||||||
if element.operator.is_clipping and not alignment_began: | ||||||
# We are in the clipping operators preceding the alignment | ||||||
start_offset += element.length_on_query | ||||||
end_offset += element.length_on_query | ||||||
elif not element.operator.is_clipping: | ||||||
# We are within the alignment | ||||||
alignment_began = True | ||||||
end_offset += element.length_on_query | ||||||
Comment on lines
+618
to
+621
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So ... I think you have to consider edge-case cigars here. What does this function report for:
|
||||||
else: | ||||||
# We have exited the alignment and are in the clipping operators after the alignment | ||||||
break | ||||||
|
||||||
return start_offset, end_offset | ||||||
|
||||||
def __getitem__( | ||||||
self, index: Union[int, slice] | ||||||
) -> Union[CigarElement, Tuple[CigarElement, ...]]: | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated fix. if deemed imporatant I could extract it to a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfarjoun This uses
target
intentionally - not all alignments are between a read and a reference - what if you align two reads vs. each other, or two contigs vs. each other?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tfenne While not all alignments are between a read and a reference, the CIGAR specification uses the terms "query" and "reference". I agree with Yossi, I find it clearer to have our documentation consistent with spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I was correcting a broken link.... if this is intentional, I'll bring it back and let's haved the discussion in a different PR (if at all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% on different PR. Though I'll also argue there for maintaining "target".
Little known fact is that CIGAR actually predates SAM/BAM and Google believes it was first documented as part of exonerate which uses "query" and "target" 😛