Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICD11 Ingest #434

Merged
merged 3 commits into from
Apr 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/developer/add-new-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,19 @@ Add a new metadata file to [src/ontology/metadata](https://github.com/monarch-in
Prefixes need to be entered in the following places in the yml:
- `curie_map`
- `extended_prefix_map`
- `subject_prefixes`

### 2.3. `config/prefixes.csv`
Add prefixes.

### 2.4. `config/context.json`
Add prefixes.

### 2.5. `lexmatch-sssom-compare.py`
There is a section of branching logic with a comment "Map ontology filenames to prefixes". Add an entry there if either
(a) there is 1 prefix you care about, and it is spelled differently than the component filename (e.g. the prefix is
`myontology`, but the filename is `components/my-ontology.owl`), or (b) there is more than 1 prefix.

## 3. Docs
### 3.1. `mkdocs.yaml`
Update the Website Table of Contents in [mkdocs.yaml](https://github.com/monarch-initiative/mondo-ingest/blob/main/mkdocs.yaml)
Expand Down
23 changes: 16 additions & 7 deletions docs/sources/icd11foundation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,29 @@

**Source name:** International Classification of Diseases 11th Revision

**Source description:** The International Classification of Diseases (ICD) provides a common language that allows health professionals to share standardized information across the world. The eleventh revision contains around 17 000 unique codes, more than 120 000 codable terms and is now entirely digital.Feb 11, 2022
**Source description:** The International Classification of Diseases (ICD) provides a common language that allows health
professionals to share standardized information across the world. The eleventh revision contains around 17 000 unique
codes, more than 120 000 codable terms and is now entirely digital.Feb 11, 2022
This data source in particular is the ICD11 foundation, not one of its linearizations.


**Homepage:** https://icd.who.int/

**Comments about this source:**
Because the existing logical equivalence class axioms led to equivalence cliques (groups of distinct disease identifiers
that inferred to he semantically identical) we decided to strip out all equivalence class axiom from the foundation
prior to processing it in the ingest.

**Comments about this source:**
_Data source_
_Original source URL_: https://icd11files.blob.core.windows.net/tmp/whofic-2023-04-08.owl.gz

_Preprocessing_
In the [monarch-initiative/icd11](https://github.com/monarch-initiative/icd11) repo, We remove unicode characters and
then remove equivalent class statements as discussed below.

_Equivalent classes_
We remove all equivalent class statements as they are not unique and result in unintended node merges. For example
`icd11.foundation:2000662282` (_Occupant of pick-up truck or van injured in collision with car, pick-up truck or van:
person on outside of vehicle injured in traffic accident_) has the same exact equivalent concept expression as
`icd11.foundation:1279712844` (_Occupant of pick-up truck or van injured in collision with two- or three- wheeled motor
vehicle: person on outside of vehicle injured in traffic accident_).

---

The data pipeline that generates the source is implemented in `make`, in this source file: [src/ontology/mondo-ingest.Makefile](https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/mondo-ingest.Makefile).

Expand Down
2 changes: 2 additions & 0 deletions src/ontology/config/context.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@
"NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_",
"ICD10CM": "http://purl.bioontology.org/ontology/ICD10CM/",
"ICD10WHO": "http://apps.who.int/classifications/icd10/browse/2010/en#/",
"icd11.foundation": "http://id.who.int/icd/entity/",
matentzn marked this conversation as resolved.
Show resolved Hide resolved
"icd11.z": "http://who.int/icd#Z_",
"OMIMPS": "https://omim.org/phenotypicSeries/PS",
"MONDOREL": "http://purl.obolibrary.org/obo/mondo#"
}
Expand Down
5 changes: 5 additions & 0 deletions src/ontology/config/icd11foundation-property-map.sssom.tsv
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
subject_id object_id
http://id.who.int/icd/schema/isObsolote owl:deprecated
http://id.who.int/icd/schema/longDefinition http://purl.org/dc/terms/description
http://id.who.int/icd/schema/note rdfs:comment
skos:definition IAO:0000115
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions src/ontology/config/properties.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ http://www.w3.org/2004/02/skos/core#narrowMatch
http://www.w3.org/2004/02/skos/core#relatedMatch
http://www.w3.org/2004/02/skos/core#exactMatch
http://www.w3.org/2004/02/skos/core#closeMatch
rdfs:comment
rdfs:label
rdfs:seeAlso
owl:deprecated
http://purl.org/dc/terms/description
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
14 changes: 14 additions & 0 deletions src/ontology/metadata/icd11foundation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,20 @@ description: >
120 000 codable terms and is now entirely digital.Feb 11, 2022
This data source in particular is the ICD11 foundation, not one of its linearizations.
comments_about_this_source: >
_Data source_
_Original source URL_: https://icd11files.blob.core.windows.net/tmp/whofic-2023-04-08.owl.gz
_Preprocessing_
In the [monarch-initiative/icd11](https://github.com/monarch-initiative/icd11) repo, We remove unicode characters and
then remove equivalent class statements as discussed below.
_Equivalent classes_
We remove all equivalent class statements as they are not unique and result in unintended node merges. For example
`icd11.foundation:2000662282` (_Occupant of pick-up truck or van injured in collision with car, pick-up truck or van:
person on outside of vehicle injured in traffic accident_) has the same exact equivalent concept expression as
`icd11.foundation:1279712844` (_Occupant of pick-up truck or van injured in collision with two- or three- wheeled motor
vehicle: person on outside of vehicle injured in traffic accident_).
homepage: https://icd.who.int/
base_prefix_map:
icd11.foundation: http://id.who.int/icd/entity/
Expand Down
1 change: 1 addition & 0 deletions src/ontology/metadata/mondo.sssom.config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,7 @@ subject_prefixes:
- EFO
- ICD10CM
- ICD10WHO
- icd11.foundation
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
- OMIMPS
- NCIT
- DOID
Expand Down
13 changes: 13 additions & 0 deletions src/ontology/mondo-ingest.Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,17 @@ $(COMPONENTSDIR)/icd10who.owl: $(TMPDIR)/icd10who_relevant_signature.txt | compo
remove -T config/properties.txt --select complement --select properties --trim true \
annotate --ontology-iri $(URIBASE)/mondo/sources/icd10who.owl --version-iri $(URIBASE)/mondo/sources/$(TODAY)/icd10who.owl -o $@; fi

$(COMPONENTSDIR)/icd11foundation.owl: $(TMPDIR)/icd11foundation_relevant_signature.txt | component-download-icd11foundation.owl
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

@joeflack4 joeflack4 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foundationReference @probably-can-do-after-PR

Details

In the property discussion thread, I wrote:

include but also put inside rdfs:comment?

What to do?
What should I do in this case? Should I just ignore it? Should I integrate it into Mondo somehow, e.g. by including in rdfs:comment?

Analysis?
Perhaps I need to show some cases of how this is used / what it looks like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples?

Copy link
Contributor Author

@joeflack4 joeflack4 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn @twhetzel Alright, so it looks like foundationReference works something like this. It exists only on annotation axioms for annotatedProperty icd11.foundation:inclusion and icd11.foundation:exclusion.

How I determined it was annotating only these 2

  1. cat tmp/component-download-icd11foundation.owl.owl | grep foundationReference -B 2 > ~/Desktop/foundationRef.txt
  2. cat ~/Desktop/foundationRef.txt| grep annotatedProperty > ~/Desktop/foundationrefprops.txt
  3. Looked at the unique values.

As I wrote in #462:

I'm not 100% sure, but at face these appear to me to be a way, to add additional modifications beyond a normal subclass relationship. You mark something a subclass of another, but then it might inherit some other relationships which then you can mark with exclusion. And then perhaps you can mark inclusion for some other classes that do not have a subclass relationship.

Example usage in OWL

For the first class, I left all properties just for reference, but for the other clases I left them out and left a ... instead. The first class (1000337196) actually actually had 2 axioms annotating its exclusions. I left one out. The axiom that I left in actually refers to the 2nd class (486722075) I've shown an example for. It also has 1 exclusion and an axiom annotation on it, referenced by the 3rd class (1868408442) I've put as an example, which itself has 4 exclusions, each with an annotation axiom.

    <owl:Class rdf:about="http://id.who.int/icd/entity/1000337196">
        <rdfs:subClassOf rdf:resource="http://id.who.int/icd/entity/1632259883"/>
        <schema:browserUrl rdf:resource="https://icd.who.int/dev11/f/en#/http%3A%2F%2Fid.who.int%2Ficd%2Fentity%2F1000337196"/>
        <schema:exclusion xml:lang="en">Catatonia</schema:exclusion>
        <schema:exclusion>Delirium</schema:exclusion>
        <schema:inclusion xml:lang="en">Semicoma</schema:inclusion>
        <skos:altLabel xml:lang="en">Semicoma</skos:altLabel>
        <skos:altLabel xml:lang="en">stuporous</skos:altLabel>
        <skos:definition xml:lang="en">Total or nearly total lack of spontaneous movement and marked decrease in reactivity to environment.</skos:definition>
        <skos:prefLabel xml:lang="en">Stupor</skos:prefLabel>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://id.who.int/icd/entity/1000337196"/>
        <owl:annotatedProperty rdf:resource="http://id.who.int/icd/schema/exclusion"/>
        <owl:annotatedTarget xml:lang="en">Catatonia</owl:annotatedTarget>
        <schema:foundationReference rdf:resource="http://id.who.int/icd/entity/486722075"/>
    </owl:Axiom>
    ...



    <owl:Class rdf:about="http://id.who.int/icd/entity/486722075">
        ...
        <schema:exclusion>Harmful effects of drugs, medicaments or biological substances, not elsewhere classified</schema:exclusion>
        <skos:prefLabel xml:lang="en">Catatonia</skos:prefLabel>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://id.who.int/icd/entity/486722075"/>
        <owl:annotatedProperty rdf:resource="http://id.who.int/icd/schema/exclusion"/>
        <owl:annotatedTarget>Harmful effects of drugs, medicaments or biological substances, not elsewhere classified</owl:annotatedTarget>
        <schema:foundationReference rdf:resource="http://id.who.int/icd/entity/1868408442"/>
    </owl:Axiom>



    <owl:Class rdf:about="http://id.who.int/icd/entity/1868408442">
        ...
        <schema:exclusion xml:lang="en">Alcohol intoxication</schema:exclusion>
        <schema:exclusion>Allergic or hypersensitivity conditions</schema:exclusion>
        <schema:exclusion>Disorders due to substance use or addictive behaviours</schema:exclusion>
        <schema:exclusion>Reactions or intoxications due to drugs administered to fetus or newborn</schema:exclusion>
        <schema:inclusion xml:lang="en">overdose of these substances</schema:inclusion>
        <schema:inclusion xml:lang="en">wrong substance given or taken in error</schema:inclusion>
        ...
        <skos:prefLabel xml:lang="en">Harmful effects of drugs, medicaments or biological substances, not elsewhere classified</skos:prefLabel>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://id.who.int/icd/entity/1868408442"/>
        <owl:annotatedProperty rdf:resource="http://id.who.int/icd/schema/exclusion"/>
        <owl:annotatedTarget xml:lang="en">Alcohol intoxication</owl:annotatedTarget>
        <schema:foundationReference rdf:resource="http://id.who.int/icd/entity/1339202943"/>
    </owl:Axiom>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://id.who.int/icd/entity/1868408442"/>
        <owl:annotatedProperty rdf:resource="http://id.who.int/icd/schema/exclusion"/>
        <owl:annotatedTarget>Allergic or hypersensitivity conditions</owl:annotatedTarget>
        <schema:foundationReference rdf:resource="http://id.who.int/icd/entity/642618805"/>
    </owl:Axiom>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://id.who.int/icd/entity/1868408442"/>
        <owl:annotatedProperty rdf:resource="http://id.who.int/icd/schema/exclusion"/>
        <owl:annotatedTarget>Disorders due to substance use or addictive behaviours</owl:annotatedTarget>
        <schema:foundationReference rdf:resource="http://id.who.int/icd/entity/1602669465"/>
    </owl:Axiom>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://id.who.int/icd/entity/1868408442"/>
        <owl:annotatedProperty rdf:resource="http://id.who.int/icd/schema/exclusion"/>
        <owl:annotatedTarget>Reactions or intoxications due to drugs administered to fetus or newborn</owl:annotatedTarget>
        <schema:foundationReference rdf:resource="http://id.who.int/icd/entity/142627676"/>
    </owl:Axiom>

So, inclusion and exclusion hold labels for classes. The URI to the foundation class (foundationReference) is then annotated on those properties.

Why not just make inclusion and exclusion point to URIs for the classes instead of their labels? I suppose it probably has something to do with linearization. I assume new class IDs get generated at linearization-time, but the labels remain static.

This comment was marked as outdated.

joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
if [ $(COMP) = true ] ; then $(ROBOT) remove -i $(TMPDIR)/component-download-icd11foundation.owl.owl --select imports \
rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
rename --mappings config/icd11foundation-property-map.sssom.tsv \
remove -T $(TMPDIR)/icd11foundation_relevant_signature.txt --select complement --select "classes individuals" --trim false \
remove -T $(TMPDIR)/icd11foundation_relevant_signature.txt --select individuals \
query \
--update ../sparql/fix-labels-with-brackets.ru \
remove -T config/properties.txt --select complement --select properties --trim true \
annotate --ontology-iri $(URIBASE)/mondo/sources/icd11foundation.owl --version-iri $(URIBASE)/mondo/sources/$(TODAY)/icd11foundation.owl -o $@; fi

$(COMPONENTSDIR)/gard.owl: $(TMPDIR)/gard_relevant_signature.txt | component-download-gard.owl
if [ $(COMP) = true ]; then $(ROBOT) remove -i $(TMPDIR)/component-download-gard.owl.owl --select imports \
remove -T $(TMPDIR)/gard_relevant_signature.txt --select complement --select "classes individuals" --trim false \
Expand Down Expand Up @@ -246,6 +257,7 @@ $(REPORTDIR)/%_term_exclusions.txt $(REPORTDIR)/%_exclusion_reasons.robot.templa
--config-path metadata/$*.yml \
--outpath-txt $(REPORTDIR)/$*_term_exclusions.txt \
--outpath-robot-template-tsv $(REPORTDIR)/$*_exclusion_reasons.robot.template.tsv
.PRECIOUS: $(REPORTDIR)/%_exclusion_reasons.robot.template.tsv
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved

$(REPORTDIR)/%_exclusion_reasons.ttl: component-download-%.owl $(REPORTDIR)/%_exclusion_reasons.robot.template.tsv
$(ROBOT) template --input $(TMPDIR)/component-download-$*.owl.owl --add-prefixes config/context.json --template $(REPORTDIR)/$*_exclusion_reasons.robot.template.tsv --output $(REPORTDIR)/$*_exclusion_reasons.ttl
Expand Down Expand Up @@ -476,6 +488,7 @@ slurp/%.tsv: $(COMPONENTSDIR)/%.owl $(TMPDIR)/mondo.sssom.tsv $(REPORTDIR)/%_map
--mondo-terms-path $(REPORTDIR)/mirror_signature-mondo.tsv \
--slurp-dir-path slurp/ \
--outpath $@
.PRECIOUS: slurp/%.tsv
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More rm issues

I checked the build log from last week, and at the end it showed rm reports/icd11foundation_exclusion_reasons.robot.template.tsv slurp/icd11foundation.tsv imports/ro_terms_combined.txt. After adding that 1 .PRECIOUS, now the slurp file is no longer being removed, but the other two still are: rm reports/icd11foundation_exclusion_reasons.robot.template.tsv imports/ro_terms_combined.txt. I think we want these, so I'm going to apply the same fix for both of those as well. I pasted more of the log from build-mondo-ingest below just in case it's of any value.

Log related to (3)

I have the full logs saved as .txt files also if interested

From the build last week:

Release files are now in ../.. - now you should commit, push and make a release         on your git hosting site such as GitHub or GitLab
rm reports/icd11foundation_exclusion_reasons.robot.template.tsv slurp/icd11foundation.tsv imports/ro_terms_combined.txt
make[1]: Leaving directory '/work/src/ontology'
Mondo Ingest has been fully completed

From the build today:

Release files are now in ../.. - now you should commit, push and make a release         on your git hosting site such as GitHub or GitLab
rm reports/icd11foundation_exclusion_reasons.robot.template.tsv imports/ro_terms_combined.txt
make[1]: Leaving directory '/work/src/ontology'
Mondo Ingest has been fully completed

I've followed up on these two remaining issues here:


.PHONY: slurp-%
slurp-%: slurp/%.tsv
Expand Down
5 changes: 4 additions & 1 deletion src/scripts/lexmatch-sssom-compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,11 +173,14 @@ def extract_unmapped_matches(input: str, matches: TextIO, output_dir: str, summa
ont_df_list = []

for _, ont in enumerate(input):
# Map ontology filenames to prefixes
ont2 = ont.upper()
if ont == "omim":
ont2 = "|".join((["OMIM", "OMIMPS"]))
elif ont == "ordo":
ont2 = "|".join((["ORDO", "Orphanet"]))
elif ont == "icd11foundation":
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
ont2 = 'icd11.foundation'

mondo_ont_df = msdf_mondo.df[condition_mondo_sssom_subj & msdf_mondo.df['object_id'].str.contains(ont2)]
mondo_ont_lex_df = lex_df[(condition_lex_df_mondo_subj & lex_df['object_id'].str.contains(ont2))]
Expand All @@ -201,7 +204,7 @@ def extract_unmapped_matches(input: str, matches: TextIO, output_dir: str, summa

ont_df_list.append(unmapped_ont_df)

combined_df = pd.concat(ont_df_list)
combined_df = pd.concat(ont_df_list) if ont_df_list else pd.DataFrame()

combined_msdf = MappingSetDataFrame(
df=combined_df, converter=msdf_lex.converter, metadata=msdf_lex.metadata
Expand Down