Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization extensions for VRS 2.x #334

Open
1 of 2 tasks
ahwagner opened this issue Jan 30, 2024 · 4 comments
Open
1 of 2 tasks

Normalization extensions for VRS 2.x #334

ahwagner opened this issue Jan 30, 2024 · 4 comments
Labels
2.0-alpha Issues related to VRS 2.0-alpha branch

Comments

@ahwagner
Copy link
Member

ahwagner commented Jan 30, 2024

As VRS grows to encompass more complex use cases, additional normalization constraints need to be imposed to promote consistency in computed identifiers. In VRS 1.x the primary normalization concern was correcting ambiguous sequence insertions/deletions in repeating regions. However, in 2.x we have additional concerns:

  • Preferentially representing a variant using RLE when possible
  • Haplotype member ordering conventions

We should implement these additional normalization rules in VRS-Python.

@ahwagner ahwagner added the 2.0-alpha Issues related to VRS 2.0-alpha branch label Jan 30, 2024
@ahwagner
Copy link
Member Author

ahwagner commented Jan 30, 2024

Preferential RLE representation checks out: tested in #336

@ahwagner
Copy link
Member Author

ahwagner commented Feb 5, 2024

I went digging into the VRS-python code last weekend, and reviewed the comments in #338. Now that we have a model that includes the notion of ga4gh.keys for digest serialization, that works across both identifiable and non-identifiable classes, we should formalize how we want to serialize in VRS (and encode this in VRS-Python).

Starting a discussion thread in VRS (ga4gh/vrs#465) to address this.

ahwagner added a commit that referenced this issue Feb 6, 2024
* add normalization example to test notebook

* update notebook metadata

* a few simple behavior tests

* Add keys to ReferenceLengthExpression and LiteralSequenceExpression

* Make LiteralSequenceExpression not-identifiable

* remove unnecessary / unused code

* addresses #338 (comment)

* Fix ReferenceLengthExpression tests in test_allele_translator

* linewise diff for test_annotate_vcf_grch38_noattrs

* Update test_vcf_expected_output_no_vrs_attrs.vcf.gz ReferenceLengthExpression

* Fix test_annotate_vcf_grch38_attrs

* Fix test_annotate_vcf_grch38_attrs_altsonly

---------

Co-authored-by: Kyle Ferriter <[email protected]>
@korikuzma
Copy link
Contributor

@ahwagner can we close this since you addressed in #342?

@larrybabb
Copy link
Contributor

@ahwagner if you are keeping this open because the Haplotype portion from the original issue is not done, please consider breaking that out into a separate issue so this can be closed based on the work in #342.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-alpha Issues related to VRS 2.0-alpha branch
Projects
None yet
Development

No branches or pull requests

3 participants