Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in calculating all_by_all_pairwise_similarity #800

Merged
merged 4 commits into from
Aug 20, 2024
Merged

Bug in calculating all_by_all_pairwise_similarity #800

merged 4 commits into from
Aug 20, 2024

Conversation

hrshdhgd
Copy link
Collaborator

@hrshdhgd hrshdhgd commented Aug 19, 2024

When len(ancestor_set) == 0 the jacquard and AIC scores are explicitly set to 0 which is incorrect.

Thank you @souzadevinicius for highlighting this!

Also updated semsimian version dependency to latest

fixes monarch-initiative/semsimian#133 (comment)

@hrshdhgd hrshdhgd changed the title Updated semsimian version Bug in calculating all_by_all_pairwise_similarity Aug 19, 2024
@caufieldjh
Copy link
Collaborator

OK, so what was happening was that the provided nodes didn't have ancestors (at least not in the provided set), so their metrics were set to zero. Should this produce an error and/or should there be an attempt (in oaklib or semsimian) to identify the potential ancestor? It's reasonable to expect that the user should only need to provide IC values for nodes of interest, even if the ancestors are required to calculate pairwise semsim.

@hrshdhgd hrshdhgd marked this pull request as ready for review August 20, 2024 15:49
@justaddcoffee
Copy link
Collaborator

Could the 3 of us do a code review on this, maybe tomorrow if we are all around?

OK, so what was happening was that the provided nodes didn't have ancestors (at least not in the provided set), so their metrics were set to zero.

If I understand the code correctly, we check here if the two terms have a common ancestor (which we call ancestor_set, but I think is a bad name - it's a tuple of ancestor_id and the IC value of this ancestor). I think the correct behavior is what we do here, which is to set the common ancestor ID in the similarity object if we have it, otherwise set it to OWL_THING or some such.

I think the remaining question is why so many (all?) of Vinicius's pairs of ontology terms didn't have a common ancestor? This seems like a bug somewhere, either in OAK, semsimian, or Phenio.

@cmungall cmungall merged commit 7602658 into main Aug 20, 2024
9 checks passed
@hrshdhgd hrshdhgd deleted the issue-133 branch August 20, 2024 20:44
@caufieldjh
Copy link
Collaborator

I think the remaining question is why so many (all?) of Vinicius's pairs of ontology terms didn't have a common ancestor? This seems like a bug somewhere, either in OAK, semsimian, or Phenio.

They certainly have common ancestors, and in PHENIO the common ancestor of an HP term and an MP term will almost always be a UPHENO term. But somehow if the UPHENO term isn't included in the provided IC map it doesn't get included as an ancestor, even if it's present in the source ontology - perhaps because we don't mix user-provided IC maps with calculated ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential problem during SEMSIM calculation with Information Content
4 participants