Lexmatch revamp #397

matentzn · 2024-01-22T21:39:28Z

This PR

adds a few missing prefixes to merged.db (through prefixes.csv) that were previously missed
replaces the prefix_map with converter parameter in src/scripts/lexmatch-sssom-compare.py
skips meta in src/scripts/match-mondo-sources-all-lexical.py which is not urgently needed and was causing all the problems @joeflack4 is fixing
amends to preprocessing queries in the pipeline that added junk xrefs to the pipeline
Updates src/ontology/config/prefixes.csv to cover all relevant prefixes for the matching
updates src/ontology/metadata/* configs to avoid conflicting namespaces.

@joeflack4 I am not reading your answers etc - please review this PR with @twhetzel and see if you have any concern with these changes.

To test this PR if you like:

docker pull obolibrary/odkfull:dev
ODK_TAG=dev ./run.sh make lexmatch/README.md -B IMP=false PAT=false MIR=false

I ran it locally - now testing the whole pipeline (I circumvent the main problem you are trying to solve by simply ditching meta here, hoping its not needed).

joeflack4 · 2024-01-22T23:43:09Z

src/scripts/match-mondo-sources-all-lexical.py

@@ -85,7 +85,8 @@ def main(verbose: int, quiet: bool):
 def run(input: str, config: str, rules: str, rejects: str, output: str):
    # Implemented `meta` param in `lexical_index_to_sssom`

-    meta = get_metadata_and_prefix_map(config)
+    #meta = get_metadata_and_prefix_map(config)
+    meta = None


lexical_index_to_sssom(meta=None)

Nico wrote:

skips meta in src/scripts/match-mondo-sources-all-lexical.py which is not urgently needed and was causing all the problems @joeflack4 is fixing

You are more familiar with all of the lexmatch outputs and you say that running it locally was fine, so it seems there are no significant negative side effects from this.

You say it is not urgently needed, so I take it that this means that there is still a desire to eventually do what I did in #394 (pass a Converter) or similar, perhaps after addressing INCATools/ontology-access-kit#698. So perhaps I should make like a medium priority ticket for this, or keep my current PR open and rebase from / conform to what you have here.

Not important but to clarify, this wasn't causing all the problems I am fixing. Rather it is a solution for the last, only, conditional, remaining unfixed problem in #394.

Trish wrote:

@joeflack4 have you also run the entire pipeline locally? Did you find that Nico's solution to "ditch meta" worked as intended?

I haven't run the whole pipeline locally. I'll give it a go but I'm terrified of it; I've heard it takes over 24 hours and sometimes this can interfere w/ other projects I'm working on. My computer wasn't actually capable of doing this until last month (without some skipping of things like NCIT). When I've asked Nico if he wants me to do this before, he's said I could hold off. But I'll at least give it a go.

Regarding checking myself the results of meta=None, I'll give that a go. I was trusting Nico and it sound like he got the results he expected, but good to have second eyes especially if you think so. Another difference I just noticed as well is that I've also been passing no meta (same as passing None), but I've been passing prefix_map instead. I'm not sure it matters in this case but I'd better remove that as well and run in a more closely to what Nico is running. I'd check out his PR as well but I think better to run on my PR because Nico made some other changes to prefixes.csv, but I want to more closely compare the outputs I got before/after with simply the meta=None/prefix_map=None change.

joeflack4 · 2024-01-22T23:45:29Z

src/scripts/match-mondo-sources-all-lexical.py

@@ -85,7 +85,8 @@ def main(verbose: int, quiet: bool):
 def run(input: str, config: str, rules: str, rejects: str, output: str):
    # Implemented `meta` param in `lexical_index_to_sssom`

-    meta = get_metadata_and_prefix_map(config)
+    #meta = get_metadata_and_prefix_map(config)
+    meta = None


#394 redundancy

The main difference is that #397 does passes meta=None to lexical_index_to_sssom(), whereas #394 passes a Converter. Supposedly #397 approach is fine.

Everything else in that PR is now cosmetic, so i can merge just those changes afterwards or ignore and close.

My instinct is to feel a little irked that I spent a lot of work and found a way to get the pipeline working without throwing the oio err (although #394 currently has oio removed and loses outputs), and that my work had I think the same overall effect but was supplanted by Nico and Charlie's PRs. But maybe I shouldn't feel irked; I laid the groundwork and this is just the nature of collaboration, and Nico/Charlie are more senior here and have a better idea of how things should exactly be.

Also, I see that this PR does add more than #394, such as missing prefixes.

joeflack4 · 2024-01-22T23:51:06Z

src/ontology/config/prefixes.csv

@@ -227,12 +227,27 @@ ICD10CM,http://purl.bioontology.org/ontology/ICD10CM/
 ICD10CM2,https://icd.codes/icd10cm/
 ICD10WHO,https://icd.who.int/browse10/2019/en#/
 ICD10WHO2010,http://apps.who.int/classifications/icd10/browse/2010/en#/
-OMIMPS,https://www.omim.org/phenotypicSeries/PS
+OMIMPS,https://omim.org/phenotypicSeries/PS


prefixes.csv updates

That will be supplanted eventually by dynamic creation of prefixes.csv but these changes seem necessary for now.

Harshad and I should have thought to check the status of this sooner.

src/sparql/fix_xref_prefixes.ru

src/sparql/rm_xref_by_prefix.ru

src/scripts/lexmatch-sssom-compare.py

joeflack4

I trust your review of the outputs and if you say it is working locally and that you're verifying by running the pipeline, I'm fine for you / us to merge this after that verification has been made!

twhetzel · 2024-01-23T16:30:36Z

I ran it locally - now testing the whole pipeline (I circumvent the main problem you are trying to solve by simply ditching meta here, hoping its not needed).

@joeflack4 have you also run the entire pipeline locally? Did you find that Nico's solution to "ditch meta" worked as intended?

src/ontology/mondo-ingest.Makefile

matentzn

Approved.. Phew!

@twhetzel @joeflack4

I looked through all the results - They look correct to me. A lot of issues fixed, way beyond the the original problem with the prefixes.

I wont be looking at this anymore this week - please merge this, write down any question you have in a Google docs so we can document things more permanently, and I will, once I see the merge, immediately make another run.

@joeflack4 if you want to retain the contents of your comments, please move them to issues.

This was a bit painful, sorry about that. The source of all problems, was a major refactor of sssom-py and curies, which led to 3 completely independent major issues all around the pipeline. The debugging process has led to some good fixes in curies (thanks @joeflack4), and I hope some better understanding of what is happening. Lets keep one thing in mind for the future:

Never ever again do we merge a branch unless a full run of the pipeline was successful. I can be responsible for the runs until we have a runner on GCP.

@twhetzel I will post a comment on slack now to advice Nicole and Sabrina to push forward the syncing.

twhetzel

Nico mentioned all looks good and it looks good to me at this point

twhetzel · 2024-01-29T05:07:14Z

src/ontology/metadata/doid.metadata.sssom.yml

+  # SNOMEDCT_US_2021_09_01: http://identifiers.org/snomedct/
+  # SNOMEDCT_US_2022_07_31: http://identifiers.org/snomedct/
+  # SNOMEDCT_US_2022_10_31: http://identifiers.org/snomedct/
+  # SNOMEDCT_US_2023_02_28: http://identifiers.org/snomedct/


@matentzn why are these commented out vs. removed or what are these comments a reminder of in the future?

matentzn added 2 commits January 22, 2024 10:20

Update Sparql queries for preprocessing

282a0a0

Update lexmatch-sssom-compare.py

2fe4c93

matentzn requested review from twhetzel and joeflack4 January 22, 2024 21:39

joeflack4 mentioned this pull request Jan 22, 2024

Bug in match-mondo-sources-all-lexical.py #394

Closed

8 tasks

joeflack4 reviewed Jan 22, 2024

View reviewed changes

joeflack4 reviewed Jan 23, 2024

View reviewed changes

src/scripts/lexmatch-sssom-compare.py Show resolved Hide resolved

joeflack4 assigned matentzn Jan 23, 2024

joeflack4 added the bug Something isn't working label Jan 23, 2024

joeflack4 linked an issue Jan 23, 2024 that may be closed by this pull request

Bug in match-mondo-sources-all-lexical.py #390

Closed

joeflack4 approved these changes Jan 23, 2024

View reviewed changes

Update various source prefix configs

6b04c71

matentzn mentioned this pull request Jan 23, 2024

Commented unwanted prefixes conforming to sssom v0.4.x #388

Closed

matentzn added 2 commits January 24, 2024 10:59

Update mondo.sssom.config.yml

9643d83

Update mondo-ingest.Makefile

b20f301

matentzn commented Jan 24, 2024

View reviewed changes

src/ontology/mondo-ingest.Makefile Show resolved Hide resolved

joeflack4 mentioned this pull request Jan 24, 2024

Bug: no goal for reports/doid.subclass.direct-in-mondo-only.tsv #398

Closed

2 tasks

update all

0345282

matentzn commented Jan 25, 2024

View reviewed changes

src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved

Update src/ontology/mondo-ingest.Makefile

f5203b7

matentzn commented Jan 25, 2024

View reviewed changes

joeflack4 mentioned this pull request Jan 26, 2024

Bug: no goal for reports/doid.subclass.direct-in-mondo-only.tsv #402

Closed

matentzn mentioned this pull request Jan 26, 2024

Reinstate sync pipeline #403

Merged

joeflack4 mentioned this pull request Jan 27, 2024

develop branch #404

Closed

4 tasks

twhetzel approved these changes Jan 29, 2024

View reviewed changes

twhetzel merged commit 28166e2 into main Jan 29, 2024

twhetzel deleted the lexmatch-revamp branch January 29, 2024 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexmatch revamp #397

Lexmatch revamp #397

matentzn commented Jan 22, 2024 •

edited

Loading

joeflack4 Jan 22, 2024 •

edited

Loading

joeflack4 Jan 23, 2024

joeflack4 Jan 22, 2024

joeflack4 Jan 22, 2024

joeflack4 left a comment

twhetzel commented Jan 23, 2024

matentzn left a comment

twhetzel left a comment

twhetzel Jan 29, 2024

Lexmatch revamp #397

Lexmatch revamp #397

Conversation

matentzn commented Jan 22, 2024 • edited Loading

joeflack4 Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

lexical_index_to_sssom(meta=None)

joeflack4 Jan 23, 2024

Choose a reason for hiding this comment

joeflack4 Jan 22, 2024

Choose a reason for hiding this comment

#394 redundancy

joeflack4 Jan 22, 2024

Choose a reason for hiding this comment

prefixes.csv updates

joeflack4 left a comment

Choose a reason for hiding this comment

twhetzel commented Jan 23, 2024

matentzn left a comment

Choose a reason for hiding this comment

twhetzel left a comment

Choose a reason for hiding this comment

twhetzel Jan 29, 2024

Choose a reason for hiding this comment

matentzn commented Jan 22, 2024 •

edited

Loading

joeflack4 Jan 22, 2024 •

edited

Loading

`lexical_index_to_sssom(meta=None)`

`prefixes.csv` updates