Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-fixes a bug in the liftover logic whereby when the alleles couldn't … #1956

Merged

Conversation

yfarjoun
Copy link
Contributor

@yfarjoun yfarjoun commented Apr 5, 2024

…be extended to the "left" because the start of the variant was already at position 1, the resulting VC was corrupt leading to the problem discussed in !1951

  • added a unit test (which fails with the same error as the issue prior to the application of the changes)
  • someone should add an actual live (with vcf etc....)
  • fixes !1951
  • other tests are failing...need to look at this more carefully.

Description

Give your PR a concise yet descriptive title
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.


Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on github actions

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

…be extended to the "left" because the start of the variant was already at position 1, the resulting VC was corrupt leading to the problem discussed in !1951

- added a unit test (which fails with the same error as the issue prior to the application of the changes)
- _someone_ should add an actual live (with vcf etc....)
- fixes !1951
- other tests are failing...need to look at this more carefully.
@yfarjoun
Copy link
Contributor Author

yfarjoun commented Apr 5, 2024

@rickymagner

I think I fixed the bug. I managed to recreate the same cryptic error in a testcase and then changes to the code removed it. Do you mind trying this version on the vcf/chain/reference you have already setup?

@rickymagner
Copy link
Contributor

Hi @yfarjoun, thanks for looking into this! I can confirm I just built your branch and ran it on the single problematic variant from the original issue, and the liftover completed successfully. I haven't looked at your code too carefully but see there's a test for this case so it seems this would provide a generalizable fix. If you make this a full PR rather than draft someone (I or someone else) can review and try to get it merged. Thanks!

Copy link
Contributor

@rickymagner rickymagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yossi, thanks for this! It seems to fix the issue well. I just made some remarks about updating the comments mostly to help make clear what is happening in the code for the next person to try to debug this tool. I know it's not your fault for the original comments, but since we're here it's probably worth spending a few minutes doing. I highlighted a few key places some extra comments could help a bit, I think. In particular, examples of what sort of variant triggers the blocks would be very useful. Thanks!

VariantContext newVc = builder.make();
final String refString = StringUtil.bytesToString(REFERENCE_WITH_REPEATS.getBases(), newVc.getStart() - 1, newVc.getEnd() - newVc.getStart() + 1);

Assert.assertEquals(refString,"CGTC");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space after comma


changesInAlleles = false;
// 2. if alleles end with the same nucleotide then
if (alleleBasesMap.values().stream()
.collect(Collectors.groupingBy(a -> a[a.length - 1], Collectors.toSet()))
.size() == 1 && theEnd > 1) {
// 3. truncate rightmost nucleotide of each allele
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be turned into a continuation of comment 2? If I understand right, this block just trims the alleles if they end with the same string (e.g. REF = "ACG", ALT="ATG" -> REF = "AC", ALT="AT" ... maybe even put this example in comment?)

@@ -390,20 +390,29 @@ protected static void leftAlignVariant(final VariantContextBuilder builder, fina
.map(a -> a.length)
.anyMatch(l -> l == 0)) {
// 6. extend alleles 1 nucleotide to the left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should also merge with 5. I'm also not sure what it means to have an "empty allele." Can this have an example in comment?

@@ -390,20 +390,29 @@ protected static void leftAlignVariant(final VariantContextBuilder builder, fina
.map(a -> a.length)
.anyMatch(l -> l == 0)) {
// 6. extend alleles 1 nucleotide to the left


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove a newline

extendLeft = false;
theEnd++;
}

for (final Allele allele : alleleBasesMap.keySet()) {
// the first -1 for zero-base (getBases) versus 1-based (variant position)
Copy link
Contributor

@rickymagner rickymagner Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments should be updated/repositioned/removed


// 7. end if
}
changesInAlleles = theStart!=oldStart || theEnd != oldEnd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means repeat the above block of truncating matching right bases, and handling whatever empty alleles mean, whenever there's been a change in the variant. Is a while block the best way to handle this? Why not just truncate all matching bases on the right in one go, and then handle this other empty edge case as well?

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Apr 9, 2024

As the comment for the function mentioned, the code follows the paper by G. Abecsis. You can see the pseudo code (including the numbering used here) here: https://genome.sph.umich.edu/wiki/Variant_Normalization

I'm happy to make it clearer somehow, but wanted to make sure the context was understood.... :-)

@rickymagner
Copy link
Contributor

Ah, good to know. In that case, it makes sense to keep in that format, but maybe this should be emphasized a bit more in the top level comments. There is a reference to the paper, but it's unclear the steps in the comments are those from the paper (at least it was to me). Maybe a sentence or two up top can clarify this.

@yfarjoun yfarjoun marked this pull request as ready for review April 10, 2024 21:01
@yfarjoun
Copy link
Contributor Author

thanks for the comments, @rickymagner I think I responded to all of them.

Copy link
Contributor

@rickymagner rickymagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a lot more context in the comments! This will surely help whomever looks at this code in the future. Looks good for merging, and thanks for handling this bug.

@rickymagner rickymagner merged commit 372cd5c into broadinstitute:master Apr 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants