Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OMIM gene references in Mondo #8108

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Update OMIM gene references in Mondo #8108

wants to merge 6 commits into from

Conversation

matentzn
Copy link
Member

@matentzn matentzn commented Aug 24, 2024

I will add this in draft mode, because this needs extremely careful review by at least @twhetzel and @sabrinatoro.

You can rerun the pipeline by checking out the branch and running

sh run.sh make update-omim-genes -B

I dont know how you want to review this, but I will remind you:

  1. The new OMIM references are the 1:1 references we obtain directly from OMIM. We assume those are all "germline mutation in X" relations to the disease.
  2. The pipeline now:
    1. Deletes all direct OMIM relations (excluding logical definitions, this is a whole nother beast)
    2. Adds all the new OMIM relations back
    3. Update equivalence class definitions with sparql to use the gene from the updated OMIM relations.

I am sure there is much that needs to be fixed, but I wanted to get the ball rolling at least.

I cant work more on this, but I think its worth reviewing it and identifying issues.

@twhetzel
Copy link
Collaborator

Aim to have this in the October release if possible.

update-omim-genes:
$(MAKE) $(TMPDIR)/external/processed-mondo-omim-genes.robot.owl -B
# We need to be less aggressive here, as some gene relations were not originally sourced
# from OMIM, and were added, for example, for ClinGen.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @sabrinatoro definitely keep clinicalgenome, PMID and orcid annotated genes.

@sabrinatoro
Copy link
Collaborator

I have reviewed this PR carefully and have found the following issues. All of these are pretty major, and she be resolved before we can go ahead with this PR.

  1. (already reported here in Nico's comment) The gene annotations with sources from clinicalgenome, PMID, and orcid should be kept.

  2. (already reported here in Nico's comment) Gene identifiers should be ‘http://identifiers.org/hgnc/XXX’ and not ‘https://identifiers.org/hgnc/XXX’

  3. Genes should not be added if the OMIM record is associated with multiple genes.
    Examples in which one gene was incorrectly added to a Mondo record (note that only one of the gene was added, unclear which gene was and not the other)

  1. Some gene annotations were removed but not added back even though there is 1 (and only 1) gene associated with the OMIM record:
    Examples:
  • MONDO:0007037: OMIM:100800, annotation removed and now missing: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/3690 {source="MONDO:mim2gene_medgen"} ! FGFR3
  • MONDO:0007039 - OMIM:101000 , annotation removed and now missing: relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/7773 {source="MONDO:mim2gene_medgen"} ! NF2
  • MONDO:0007041 - OMIM:101200, annotation removed and now missing:relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/3689 {source="MONDO:mim2gene_medgen"} ! FGFR2
  1. The following example is very problematic in many ways. First, the omim record is associated with multiple genes, so this update should not have been made.
    MONDO:0007103 ; OMIM:105400 - change:
  1. (also shown above): Since we do not add a source for equivalent definition, we do not know where they come from. Many of them are created by a curator/Clingen based on the definition of a term which might not have an omim correspondent (e.g. gene-related neuropathy). Maybe we should add sources to the equivalent definition and take this into account when updating the gene annotation.

  2. Since the gene annotation with pmid/clingen/orcid source should be maintained, I suggest that we add a QC check when there is more than one affected gene. In very few cases (e.g. digenic diseases) having more than one gene is ok, but a curator should give the ok.

@twhetzel
Copy link
Collaborator

I am working through these issues.

@twhetzel
Copy link
Collaborator

Related to point 3. "Genes should not be added if the OMIM record is associated with multiple genes", in tracking back the OMIM processing steps I do not see all the genes in one of the initial files, ie omim.ttl. Content from this file is further transformed and eventually used in the omim pipeline. I think this is a bug in that processing and submitted an issue in our OMIM repo about this. I am mentioning this here, in case others not subscribed for updates in that repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants