Create Semsimian object with concerned predicates only. #631

hrshdhgd · 2023-08-07T15:57:07Z

Create semsimian object only with concerned predicates
Pydantic version < 2
Semsimian anchored at 0.2.0
Add in-memory caching for better efficiency
termset-similarity command returns TermSetPairwiseSimilarity object

pyproject.toml

cmungall

pinned too tightly

tox.ini

src/oaklib/implementations/semsimian/semsimian_implementation.py

cmungall

I don't think the logic is quite right. Consider the case where the client does:

sim1 = adapter.pairwise_similarity(x,y, predicates=[IS_A])
sim2 = adapter.pairwise_similarity(x,y, predicates=[IS_A, PART_OF])
<some kind of comparison of the two...>

I believe that sem2 will give the same result. It will silently use the same cached filtered closure, and the user will be none the wiser.

If we do thing that the standard use case of semsimian is that you will never want to switch predicate sets in one session (not ideal but acceptable in the short term), then it is vital that the above code will fail fast on the second call and inform the user that the cached object is already frozen on the IS_A-only closure.

However, I think a more standard solution is that you have a cache of semsimian objects keyed by the tuple of the ordered list of predicates (ordered to avoid unnecessary re-computation as [IS_A, PART_OF] is the same as [PART_OF, IS_A]).

e.g. one of the object attributes would be

    semsimian_cache: Dict[Tuple, Semsimian] = None

Then create_pairwise_similarity_output_object would turn predicates to a tuple tuple(sorted(predicates)) and use that as a key.

There is a danger here that the client could exhaust memory by iterating through all combos of 100 predicates in uberon... but I think that is acceptable for now (later on we could have a system that did more active cache management).

codecov-commenter · 2023-08-07T17:24:44Z

Codecov Report

Patch coverage: 82.35% and project coverage change: -0.04% ⚠️

Comparison is base (4a306aa) 77.08% compared to head (ed804dd) 77.04%.
Report is 1 commits behind head on main.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #631      +/-   ##
==========================================
- Coverage   77.08%   77.04%   -0.04%     
==========================================
  Files         245      247       +2     
  Lines       28415    28540     +125     
==========================================
+ Hits        21903    21990      +87     
- Misses       6512     6550      +38

Files Changed	Coverage Δ
...lementations/semsimian/semsimian_implementation.py	`85.88% <70.00%> (-9.44%)`	⬇️
tests/test_implementations/test_pronto.py	`90.93% <100.00%> (-0.86%)`	⬇️
...t_implementations/test_semsimian_implementation.py	`87.80% <100.00%> (+2.29%)`	⬆️

... and 15 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/oaklib/implementations/semsimian/semsimian_implementation.py

caufieldjh · 2023-08-18T16:03:28Z

Because changes in this branch are necessary for Monarch use cases and will not be impacted by Windows-specific errors, I'm going to consider this state sufficient for an RC release and will open a fresh PR for the Windows issue.

Create Semsimian object with concerned predicates only.

cd5f78f

hrshdhgd requested a review from cmungall August 7, 2023 15:57

hrshdhgd marked this pull request as ready for review August 7, 2023 15:57

poetry dependency versions fixed

55541d8

hrshdhgd marked this pull request as draft August 7, 2023 16:09

hrshdhgd removed the request for review from cmungall August 7, 2023 16:09

hrshdhgd added 2 commits August 7, 2023 11:12

Added term_pairwise_similarity_attributes

a919cff

formatted and added gilda as test depend. for tox

4b758e3

hrshdhgd marked this pull request as ready for review August 7, 2023 16:22

hrshdhgd mentioned this pull request Aug 7, 2023

Pinning pydantic to be <2 until we are sure the full stack works. See #628 Bumping semsimian to 0.1.20 #630

Merged

cmungall reviewed Aug 7, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

cmungall requested changes Aug 7, 2023

View reviewed changes

hrshdhgd added 2 commits August 7, 2023 12:04

removed tight pinning.

7eb968e

lock file updted

a24a7af

cmungall reviewed Aug 7, 2023

View reviewed changes

tox.ini Show resolved Hide resolved

cmungall reviewed Aug 7, 2023

View reviewed changes

src/oaklib/implementations/semsimian/semsimian_implementation.py Outdated Show resolved Hide resolved

cmungall requested changes Aug 7, 2023

View reviewed changes

cmungall reviewed Aug 7, 2023

View reviewed changes

src/oaklib/implementations/semsimian/semsimian_implementation.py Outdated Show resolved Hide resolved

hrshdhgd added 11 commits August 7, 2023 14:31

simplified logic

bd142b2

simplified logic

01d5593

Set up caching

3cb0f09

formatted

faf6763

addressed if slug is a path

3700228

formatted

f979c9a

Merge branch 'main' into semsim-obj-create

147b680

potential windows gh action error fix

0893574

random space added

2edc989

removed

a453a23

Merge branch 'main' into semsim-obj-create

fc23b4b

hrshdhgd requested a review from sierra-moxon August 9, 2023 17:09

hrshdhgd added 21 commits August 9, 2023 16:53

updating semsimian to release candidate 0.2.0-rc1

cf69712

Added temp flag --score-only and termset_pairwise_similarity_temp

37dac2b

formatted

6004083

refactored

ac264d7

bumped semsimian dependency to rc2

6c79880

docstring edited

682cc85

notebook added

8328bc9

full termset pairwise similarity returned now!

1edb604

quicker semsim!

cd7a817

Added additional Windows check

ede0348

testing QC

c8bee7f

trying latest poetry version as exprmnt

fbf4969

rolled back

3c8c746

removing os constraint for testing

3cd05df

testing windows in semsimian implementation

129d87d

undo windows addition and update pip in gh

c0c11c2

removed unnecessary import

bcc691a

semsimian updated

b1c9c86

removed commented code & added warning for labels

2cc38cf

removed Windows path fix here because rust handles it

45a4fe9

undid commenting

9d675f3

caufieldjh approved these changes Aug 18, 2023

View reviewed changes

caufieldjh added 2 commits August 18, 2023 13:23

Skip Semsimian tests on windows for now

dd3d704

Lintin

ed804dd

cmungall approved these changes Aug 18, 2023

View reviewed changes

caufieldjh merged commit 529cdb8 into main Aug 18, 2023
5 checks passed

caufieldjh deleted the semsim-obj-create branch August 18, 2023 19:13

caufieldjh mentioned this pull request Aug 18, 2023

Resolve sqlite handling differences when using Semsimian and when on Windows #638

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Semsimian object with concerned predicates only. #631

Create Semsimian object with concerned predicates only. #631

hrshdhgd commented Aug 7, 2023 •

edited

Loading

cmungall left a comment

cmungall left a comment

codecov-commenter commented Aug 7, 2023 •

edited

Loading

caufieldjh commented Aug 18, 2023

Create Semsimian object with concerned predicates only. #631

Create Semsimian object with concerned predicates only. #631

Conversation

hrshdhgd commented Aug 7, 2023 • edited Loading

cmungall left a comment

Choose a reason for hiding this comment

cmungall left a comment

Choose a reason for hiding this comment

codecov-commenter commented Aug 7, 2023 • edited Loading

Codecov Report

caufieldjh commented Aug 18, 2023

hrshdhgd commented Aug 7, 2023 •

edited

Loading

codecov-commenter commented Aug 7, 2023 •

edited

Loading