-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Semsimian object with concerned predicates only. #631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinned too tightly
src/oaklib/implementations/semsimian/semsimian_implementation.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the logic is quite right. Consider the case where the client does:
sim1 = adapter.pairwise_similarity(x,y, predicates=[IS_A])
sim2 = adapter.pairwise_similarity(x,y, predicates=[IS_A, PART_OF])
<some kind of comparison of the two...>
I believe that sem2 will give the same result. It will silently use the same cached filtered closure, and the user will be none the wiser.
If we do thing that the standard use case of semsimian is that you will never want to switch predicate sets in one session (not ideal but acceptable in the short term), then it is vital that the above code will fail fast on the second call and inform the user that the cached object is already frozen on the IS_A-only closure.
However, I think a more standard solution is that you have a cache of semsimian objects keyed by the tuple of the ordered list of predicates (ordered to avoid unnecessary re-computation as [IS_A, PART_OF] is the same as [PART_OF, IS_A]).
e.g. one of the object attributes would be
semsimian_cache: Dict[Tuple, Semsimian] = None
Then create_pairwise_similarity_output_object
would turn predicates
to a tuple tuple(sorted(predicates))
and use that as a key.
There is a danger here that the client could exhaust memory by iterating through all combos of 100 predicates in uberon... but I think that is acceptable for now (later on we could have a system that did more active cache management).
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #631 +/- ##
==========================================
- Coverage 77.08% 77.04% -0.04%
==========================================
Files 245 247 +2
Lines 28415 28540 +125
==========================================
+ Hits 21903 21990 +87
- Misses 6512 6550 +38
☔ View full report in Codecov by Sentry. |
src/oaklib/implementations/semsimian/semsimian_implementation.py
Outdated
Show resolved
Hide resolved
Because changes in this branch are necessary for Monarch use cases and will not be impacted by Windows-specific errors, I'm going to consider this state sufficient for an RC release and will open a fresh PR for the Windows issue. |
termset-similarity
command returnsTermSetPairwiseSimilarity
object