Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hgvs to vrs is returning valid results when hgvs has IncorrectReferenceAllele #364

Open
larrybabb opened this issue Mar 15, 2024 · 2 comments
Labels
bug Something isn't working priority:high High priority

Comments

@larrybabb
Copy link
Contributor

In clinvar there's a variant NM_006087.3:c.900C>A (267781) that has no hgvs or spdi or location data. When I used the metakb variant normalizer service translate_from which uses the vrs-python translate from it returned a valid VRS object.

curl -X 'GET' \
  'https://normalize.cancervariants.org/variation/translate_from?variation=NM_006087.3%3Ac.900C%3EA&fmt=hgvs' \
  -H 'accept: application/json'

Response Body
{
  "query": {
    "variation": "NM_006087.3:c.900C>A",
    "fmt": "hgvs"
  },
  "warnings": [],
  "service_meta_": {
    "name": "variation-normalizer",
    "version": "0.8.1",
    "response_datetime": "2024-03-15T17:10:02.804011Z",
    "url": "https://github.com/cancervariants/variation-normalization"
  },
  "vrs_python_meta_": {
    "name": "vrs-python",
    "version": "2.0.0a2",
    "url": "https://github.com/ga4gh/vrs-python"
  },
  "variation": {
    "id": "ga4gh:VA.AO175l6scMggCBYXONYydcuvMsoZqNXi",
    "type": "Allele",
    "location": {
      "id": "ga4gh:SL.SoeOSfpr0PfwJu_akcSCZ7DMyDRodV-C",
      "type": "SequenceLocation",
      "sequenceReference": {
        "type": "SequenceReference",
        "refgetAccession": "SQ.k_G7nBWO-L7cKeMOjyJlibHhDn1Ts69Q"
      },
      "start": 899,
      "end": 900
    },
    "state": {
      "type": "LiteralSequenceExpression",
      "sequence": "A"
    }
  }
}

So, tried to lookup this variant in the clingen allele registry and found that it failed for the following reason

We were not able to parse, find, or, register allele using NM_006087.3:c.900C>A HGVS expression or CA Identifier.
The following information might be helpful to understand the reason.

Type of the error: IncorrectReferenceAllele

Explanation: Reference allele does not match for NM_006087.3[6985-0,6986+0), given=C, found=G.

Reference sequence:

Actual allele: G

Provided in the HGVS expression: C

Region: [6985-0,6986+0)

I did not investigate further to see if vrs-python ignores checking the Reference Alleles or not, because I assume it to be true.

I don't think vrs-python translate_from should accept hgvs expressions that contain referenceAlleles that do not actually match the nucleotides specified by the hgvs expression. Like the Clingen Allele Registry, we should probably throw an exception.

@larrybabb larrybabb added bug Something isn't working priority:high High priority labels Mar 15, 2024
@larrybabb
Copy link
Contributor Author

I think we should be checking reference alleles on all format types spdi, gnomad, beacon and hgvs.

@korikuzma
Copy link
Contributor

@larrybabb I think this is related to #151 . Once added, I can update the normalizer's vrs-python endpoints (which haven't been updated in a long time) to accept kwargs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:high High priority
Projects
None yet
Development

No branches or pull requests

2 participants