Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle Allele normalization for Range Locations #237

Open
ahwagner opened this issue Aug 25, 2023 · 6 comments
Open

How to handle Allele normalization for Range Locations #237

ahwagner opened this issue Aug 25, 2023 · 6 comments
Assignees
Labels
2.0-alpha Issues related to VRS 2.0-alpha branch question Further information is requested Stale-exempt

Comments

@ahwagner
Copy link
Member

One major theme raised in #234 is the question of "how do we handle Allele normalization when the Allele Location is specified by Ranges"? To me, these have always seemed to be a shorthand for "I did a targeted region assay and want to craft general statements about copy number in those regions and the potential broader impact they have". I know we allow people to create Alleles with Range-based Locations anyway, but... why? The PR supports those cases and raises interesting questions, e.g. what do we do with definite range intervals?

@ahwagner ahwagner added question Further information is requested 2.0-alpha Issues related to VRS 2.0-alpha branch labels Aug 25, 2023
@larrybabb
Copy link
Contributor

All great points. After looking at this for 30 minutes and thinking about it on a Sunday night, I tend to fall on the following side of things...

  1. We should not allow ranges as endpoints in any alleles.
  2. We should allow range endpoints in copy numbers (only at this point)
  3. You cannot normalize a location with one or both endpoints as ranges (definite or indefinite).

I think these range endpoints are only needed for microarray calls (unless someone can educate me otherwise). I believe these microarray calls really only produce representations of deleted or duplicated regions (often times with ambiguous endpoints). I think we will be treating these as copy number variants (CopyNumberChange Variants) in a a way this will help reduce the confusion on what and where these type of variants belong.

Again, I'm no expert in all the places where these type of ambiguous variant calls come from, but I would say that calling them alleles is not exactly aligned with our computational definition. As we have noted many times, any "deletion" could be considered as a molecular variant and thus an Allele, but it is also a copy number (system) loss. Let's discuss further, but that's my Sunday night feedback for what it's worth.

@larrybabb
Copy link
Contributor

@ahwagner is it possible that indefinite or definite range endpoints should be treated as either one of the forthcoming SV breakend or breakpoint classes? I'm still not sure I have my head around the breakend concept completely, but it sure feels like the indefinite ranges are similar to a breakend.

Please educate me on why this is a non-sensical idea.

@ahwagner
Copy link
Member Author

@larrybabb regarding #237 (comment), I think that indefinite range data structures (and the SequenceLocation objects that use them) are compatible with breakend representation. I'm going to be reviewing and commenting on some of the outstanding SV-VRS issues later this week and will come back to this, but wanted to move the discussion about your recent comment over to ga4gh/vrs#365, where this same solution was proposed by @cmprocknow.

Copy link

github-actions bot commented Jan 6, 2024

This issue was marked stale due to inactivity.

@larrybabb
Copy link
Contributor

@ahwagner Where do we stand on this? Are we fully supporting the notion of start and end positions on a SequenceLocation being both an integer as well as a Range ? This is fairly critical if we plan to treat all Range based positions in hgvs expressions as Adjacency types (or Breakends). I'd like to know if we should focus on a firm direction before we go much further. We are about to implement Range in vrs-python for the allele and cnv translators for hgvs expressions like NC_000006.12:g.(?_57046622)_(57088889_?)del.

It seems like we may just presume that any hgvs expression that has a Range endpoint is really a structural variant of type del or dup that can be represented as an Adjacency. Please clarify your perspective here.

@ehclark
Copy link
Contributor

ehclark commented Apr 4, 2024

I will make a comment here not fully understanding all the details. But I do think it is relevant.

For CNVs specifically, our current filtration/annotation process uses bedtools intersect. The CNV calls are coming from DRAGEN. The CNV databases we are using include ClinVar, ClinGen, Decipher, GeneDx, Manta, and gnomAD. The typical requirement is a 50% reciprocal overlap between the patient/subject calls and the database.

In the future when we adopt VRS IDs for CNVs I think we will want to be able to do the equivalent of bedtools intersect using the VRS objects. It looks to me like both Range or Adjacency would support this computation. Although for Adjacency if one or both of the adjoinedSequences were IRIs, it could get complicated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-alpha Issues related to VRS 2.0-alpha branch question Further information is requested Stale-exempt
Projects
None yet
Development

No branches or pull requests

5 participants