Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: mondo.sssom.config.yml.subject_prefixes -> metadata/SOURCE.yml #483

Open
joeflack4 opened this issue Mar 29, 2024 · 6 comments
Open
Assignees
Labels
code quality House keeping

Comments

@joeflack4
Copy link
Contributor

Overview

The subject_prefixes section of mondo.sssom.config.yml exists, as far as I'm aware, in order to tell us (particularly to tell lexmatch), what prefixes pertain to term classes. This allows lexmatch to filter and match only on the correct things. I'm not 100% sure, but I think it might be best to move this information into the metadata/SOURCE.yml.

Questions

1. Redundancy?

This section feels redundant, or perhaps misplaced. There are several places now where prefixes need to be added when we add a new source to Mondo. Some of these are automated by curies, e.g. the OBO context JSON and now the SemSQL prefixes.csv. But I think this needs to be handled separately from curies. Perhaps this isn't really a question. As I'm thinking about it, I don't see where this is redundant. This list subject_prefixes represents the subsets of prefixes in an ontology (classes) that we want to do matching on.

2. Best location?

I was thinking that maybe the best location for this would be in each ontology's metadata/*.yml file. We have a prefix_map and a base_prefix_map (subset of prefix_map that is 'owned' by the ontology) there. It would seem to me that a good place to put this. Isn't it the case that the prefixes we have in subject_prefixes in mondo.sssom.config.yml represent things we want to match against, which would only be classes, right? If that's the case, then should we not have a class_prefixes section in metadata/*.yml and put this there instead? Intuitively also that just feels like the best location to me. Though if this is only going to be used by mondo.sssom.config.yml, maybe it is indeed best there. Not saying this is a big priority; if it's not broke, don't fix it, as the saying goes. Thoughts?

@joeflack4 joeflack4 self-assigned this Mar 29, 2024
@joeflack4 joeflack4 added the code quality House keeping label Mar 29, 2024
@joeflack4
Copy link
Contributor Author

@matentzn I created this based on #434 (comment). I agree this is best as a new issue.
@twhetzel We discussed this in a call. No need to worry about this this is super low priority.
@hrshdhgd You're the most relevant person to discuss this with me, but this is super low priority.

@joeflack4 joeflack4 mentioned this issue Mar 29, 2024
9 tasks
@hrshdhgd
Copy link
Member

hrshdhgd commented Mar 29, 2024

So you're basically suggesting just to rename the file? or extract out the subject_prefixes part to a new file?

@joeflack4
Copy link
Contributor Author

Suggesting this one:

or extract out the subject_prefixes part to a new file

Pros:

  • Everything related to understanding source's prefixes and what purposes they serve remain in their respective metadata/SOURCE.yml

Cons:

  • Requires refactoring of lexmatch, and also introduces new complexity (adding metadata/SOURCE.yml as another input)

@hrshdhgd
Copy link
Member

If it's as minor as that, got ahead and do it. As long as it doesn't break anything 😄 . I don't have strong opinions

@matentzn
Copy link
Member

matentzn commented Apr 4, 2024

Ithink subject_prefixes may be historical, and only relevant in the mondo repo. Feel free to remove it and see if the pipeline still runs as @hrshdhgd suggests!

@joeflack4
Copy link
Contributor Author

It wouldn't be a removal; it'd be a refactor. It's a low priority but I'm glad we came to a determination here! Good to know that this is also something that is not expected as some sort of standard SSSOM config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code quality House keeping
Projects
None yet
Development

No branches or pull requests

3 participants