-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add cigar utility method get_alignment_offsets #188
Open
yfarjoun
wants to merge
10
commits into
main
Choose a base branch
from
yf_add_cigar_utility_methods
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 7 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
961990a
- add cigar utility method get_alignment_offsets to find the offsets …
yfarjoun ddf1ef5
Update fgpyo/sam/__init__.py
yfarjoun 4fa85d6
Update tests/fgpyo/sam/test_cigar.py
yfarjoun f371b6c
- responded to review comments.
yfarjoun e041787
responding to review comments
yfarjoun b07c3e4
docs
yfarjoun c334b43
rename
yfarjoun 6124684
reformat again
yfarjoun 5c8a76a
added "RangeOfBases class to hold the results" (with some utility met…
yfarjoun b535906
types...
yfarjoun File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -547,6 +547,47 @@ def length_on_target(self) -> int: | |||
"""Returns the length of the alignment on the target sequence.""" | ||||
return sum([elem.length_on_target for elem in self.elements]) | ||||
|
||||
def query_alignment_offsets(self, reverse: bool = False) -> Optional[range]: | ||||
"""Gets the 0-based, end-exclusive positions of the first and last aligned base in the query. | ||||
The resulting range will contain the range of positions in the SEQ string for the bases | ||||
that are aligned. If no bases are aligned, the return value will be None. | ||||
|
||||
Args: | ||||
reverse: If True, count from the end of the query. i.e. find the offsets | ||||
using the reversed elements of the cigar. | ||||
|
||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||
|
||||
Returns: | ||||
A range, defining the start and stop offsets of the aligned part | ||||
of the query. These offsets are 0-based and open-ended, with respect to the | ||||
beginning of the query. (If 'reverse' is True, the offsets are with | ||||
respect to the reversed query.) | ||||
If no bases are aligned, the return value will be None. | ||||
""" | ||||
start_offset: int = 0 | ||||
end_offset: int = 0 | ||||
element: CigarElement | ||||
alignment_began = False | ||||
elements = self.elements if not reverse else reversed(self.elements) | ||||
for element in elements: | ||||
if element.operator.is_clipping and not alignment_began: | ||||
# We are in the clipping operators preceding the alignment | ||||
# Note: hardclips have length-on-query=0 | ||||
start_offset += element.length_on_query | ||||
end_offset += element.length_on_query | ||||
elif not element.operator.is_clipping: | ||||
# We are within the alignment | ||||
alignment_began = True | ||||
end_offset += element.length_on_query | ||||
Comment on lines
+618
to
+621
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So ... I think you have to consider edge-case cigars here. What does this function report for:
|
||||
else: | ||||
# We have exited the alignment and are in the clipping operators after the alignment | ||||
break | ||||
|
||||
ret = range(start_offset, end_offset) | ||||
if not alignment_began or len(ret) == 0: | ||||
return None | ||||
return ret | ||||
|
||||
def __getitem__( | ||||
self, index: Union[int, slice] | ||||
) -> Union[CigarElement, Tuple[CigarElement, ...]]: | ||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tfenne I disagree with the suggestion to use a
range
object.range
range
cannot be used to slice a string, nor does it support tuple unpacking.I think it would be more ergonomic to return a
tuple[int, int]
, though I wouldn't mind seeing a use case whererange
provides advantages. (Nor would I be opposed to using aNamedTuple
withstart
andend
attributes)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opinionated here, but am noticing that range does have start-stop access, so in that regard it will be equivalent to the named NamedTuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(almost stop!=end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you can't unpack the
range
i.e.
So when calling the method, you can either accept the tuple or unpack it immediately, depending on which makes more sense in your context
e.g. I imagine most use cases will look like the following