Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 396 - Migrate to ClinVar XML V2 #432

Merged
merged 8 commits into from
Jun 4, 2024

Conversation

apriltuesday
Copy link
Contributor

@apriltuesday apriltuesday commented May 23, 2024

Closes #396

  • Parse XSD version from header so we can continue to parse historical V1 data
  • Extract clinical significance info from the ClinVarRecord class into a new ClinicalClassification class
    • This is not really necessary right now, but enables us to model the additional structure more completely in the future.
  • If there is a single clinical classification in an RCV and a single description for that classification, this is consistent with the current schema so we proceed as usual.
  • If there are multiples, this is logged and counted as a skipped record, but will not crash the pipeline.
  • Most of the other changed files are test datasets; for these I used the same RCVs but from a recent ClinVar release to keep things as stable as possible, but of course there are still differences...

@apriltuesday apriltuesday marked this pull request as ready for review May 24, 2024 15:09
@apriltuesday apriltuesday requested a review from tcezard May 24, 2024 15:09
@apriltuesday apriltuesday self-assigned this May 24, 2024
if len(clinvar_record.clinical_classifications) > 1:
logger.warning(f'Found multiple clinical classifications in record {clinvar_record.accession}')
report.clinvar_skip_multiple_clinical_classifications += 1
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's maybe another failure mode we're missing, which is if there's a single clinical classification but it's a string that isn't in this enum... this would include values like "Tier I - Strong" and others listed here.

At it stands this is only caught when the evidence string is generated and validated, because we don't maintain the list of acceptable values in this code base. I'm not sure if we should change this or just let these be counted as invalid and investigated.

@apriltuesday apriltuesday merged commit 21e78db into EBIvariation:master Jun 4, 2024
1 check passed
@apriltuesday apriltuesday deleted the new-xml branch June 4, 2024 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to new ClinVar XML
2 participants