-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade externally managed content #628
base: develop
Are you sure you want to change the base?
Conversation
Unfortunately, I cant find the orginal TSV file, but it is highly unlikely this file will ever change.
in src/ontology ``` wget http://purl.obolibrary.org/obo/mondo.owl -O mondo.owl sh run.sh make tmp/mondo-incl-robot-report.tsv ```
…ged content cc @twhetzel check this commit to look at the emc error fixing pipeline in isolation
Next steps:
|
Redesign of CLI, function names ectc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one change request (related to EFO proxy merges, see comments) which I would like your opinion on @twhetzel, else I think this is good to go. You can run the pipeline with:
sh run.sh make update-externally-managed-content -B
However, it does take a long time, so strap in. The biggest downside of this PR is the fact that I chose to properly simulate the EMC pipeline, which adds a big overhead on the runtime.
EDIT: Not that long in the end, at least if you discount the mondo repo refresh:
### DEBUG STATS ###
Elapsed time: 0:13.96
Peak memory: 1669976 kb
I just (24.08.2024, 2pm EEST) finished a complete run of the external pipeline, and all looks good to me. So remains to be seen what you think of my comments and your general review.
| MONDO:0007030 | | | Aarskog Syndrome | | | | | | nord | duplicate_scoped_synonym (oboInOwl:hasExactSynonym) | | ||
| MONDO:0007306 | | | Klippel-Feil Syndrome | | | | | | nord | duplicate_scoped_synonym (oboInOwl:hasBroadSynonym) | | ||
| MONDO:0007389 | | | Spondylothoracic Dysplasia | | | | | | nord | duplicate_scoped_synonym (oboInOwl:hasExactSynonym) | | ||
| MONDO:0005676 | | | | | | http://purl.obolibrary.org/obo/mondo#nord_rare | | | nord | qc-animal-disease-rare (oboInOwl:inSubset) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am glad this made it through after some fiddling!
|
||
$(TMPDIR)/mondo-with-simulated-emc-run.owl: $(TMPDIR)/mondo_repo_built | ||
mkdir -p $(TMPDIR)/mondo/src/ontology/tmp/external/ | ||
$(foreach n,$(EXTERNAL_FILES), cp -f $(EXTERNAL_CONTENT_DIR)/$(n).robot.owl $(TMPDIR)/mondo/src/ontology/tmp/external/processed-$(n).robot.owl;) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line here is the crux of the whole PR. Instead of creating our own EMC pipeline here, we are letting the real EMC pipeline in the Mondo repo run. This works basically by copying the unprocessed files over into the mondo repo, running the EMC pipeline there as it is, and then copying the mondo edit back here for running our new QC system.
src/ontology/mondo-ingest.Makefile
Outdated
# Unfortunately, @matentzn cant find the orginal TSV file, but it is highly unlikely this file will ever change. | ||
|
||
$(EXTERNAL_CONTENT_DIR)/efo-proxy-merges.robot.owl: | ||
echo "WARNING: $@ is currently manually curated!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my self review, I am a bit inclined now to drop this file entirely from the pipeline and have this being fixed on the EFO side - like with all other content we do. Here is what this file does:
- There are 3 cases or 4 where two EFO ids map to the same Mondo ID
- This file makes that valid by adding "preferred external" to one of the EFO ids in each pair
I feel now on second thought that we should not easily validate externally provided proxy merged IDs, and work with the upstream sources to fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn I agree that we should not fix these on our side, but work with the provider of the source to fix these or a a minimum have the provider determine which should be tagged as preferred. There is also usually some curator work to determine which one should be the "preferred external" so automatically choosing which ID skips this needed work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have fixed it now. I made a test here: https://github.com/monarch-initiative/mondo/pull/8120/files#r1734450583
This commit describes the removal of that part from the pipeline: 52fdd09
Note that I have yet deleted the efo-merges owl file from this PR, but I have removed all its content, as there is a cyclic dependency here that was too cumbersome to remove. If you like you can remove these files after first removing references to this file from mondo.Makefile, e.g.:
cp mondo-edit.owl mondo-edit.owl.bak && \ | ||
cp mondo-edit.obo mondo-edit.obo.bak && \ | ||
make update-external-content-incl-rare -B MIR=false IMP=false MIR=false DOWNLOAD_EXTERNAL=false &&\ | ||
make mondo-edit.owl MIR=false IMP=false MIR=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this goal in the Makefile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it has already made its way into mondo
makefile on main
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all that you see here is in the context of the mondo
, not mondo-ingest
pipeline!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will Approve for now, but I do think that proxy merges from EFO should not be fixed automatically related to this comment. I don't know if there is still time to change this.
The full build does not work for this branch. I am looking into this. Error:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full build no longer works after further changes were added after this was already reviewed and previously approved.
The GH Action in the Mondo repo "Update Externally Provided Content" does still work so this PR will be left unmerged until the issue with the full build can be sorted out and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build is failing.
ERROR duplicate_scoped_synonym | ||
ERROR file:tmp/mondo/src/sparql/qc/general/qc-duplicate-exact-synonym-no-abbrev.sparql | ||
ERROR file:tmp/mondo/src/sparql/qc/mondo/qc-proxy-merges.sparql | ||
ERROR file:tmp/mondo/src/sparql/qc/general/qc-related-exact-synonym.sparql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing this check since it looks like it is covered by the default ROBOT check of duplicate_scoped_synonym
already included in this file.
- This provides a necessary bug fix
* data build on main following mondo release * remove logs file
Overview
This PR:
Pre-merge checklist
Documentation
Was the documentation added/updated under
docs/
?QC
Was the full pipeline run before submitting this PR using
sh run.sh make build-mondo-ingest
on this branch (afterdocker pull obolibrary/odkfull:dev
), and no errors occurred?New Packages
Were any new Python packages added?
Were any other non-Python packages added?
PR Review and Conversations Resolved
Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?