Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes all duplicate scoped synonym #2332

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

anitacaron
Copy link
Collaborator

@anitacaron anitacaron commented Mar 1, 2022

Two cases were fixed, when:

  1. same term label and any other scoped synonym label
  2. same label in two different scoped synonym

The cases were fixed to have oboInOwl:hasExactSynonym.

The following sparql update query were used (here it's mixed the fixes for the two cases):

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

DELETE {
 ?cls ?synonym ?value .
    ?x a owl:Axiom ;
           owl:annotatedSource ?cls ;
           owl:annotatedProperty ?synonym ;
           owl:annotatedTarget ?value ;
           ?p ?z .
}
INSERT {
 ?cls oboInOwl:hasExactSynonym ?value .
    ?x a owl:Axiom ;
           owl:annotatedSource ?cls ;
           owl:annotatedProperty oboInOwl:hasExactSynonym ;
           owl:annotatedTarget ?value ;
           ?p ?z .
}
WHERE {
  VALUES ?synonym {oboInOwl:hasExactSynonym oboInOwl:hasRelatedSynonym oboInOwl:hasNarrowSynonym oboInOwl:hasBroadSynonym} 
VALUES ?synonym2 {oboInOwl:hasExactSynonym oboInOwl:hasRelatedSynonym oboInOwl:hasNarrowSynonym oboInOwl:hasBroadSynonym} 
  ?cls rdfs:label ?label .
  ?cls ?synonym ?value .
  ?cls ?synonym2 ?value .
  OPTIONAL {
    ?x a owl:Axiom ;
           owl:annotatedSource ?cls ;
           owl:annotatedProperty ?synonym ;
           owl:annotatedTarget ?value ;
           ?p ?z .
  }
  FILTER (str(?label)=str(?value))
  FILTER(?synonym != ?synonym2)
  FILTER( !isBlank(?cls) && regex(str(?cls), "^http://purl.obolibrary.org/obo/UBERON_"))
}

Fixes #2305

@shawntanzk
Copy link
Collaborator

converting to a draft so it doesn't accidentally get merged - this needs to be reviewed before merging. I'll post a diff here to make reviewing easier.

@shawntanzk shawntanzk marked this pull request as draft March 1, 2022 20:27
@shawntanzk shawntanzk requested a review from dosumis March 1, 2022 20:34
@matentzn
Copy link
Contributor

matentzn commented Mar 2, 2022

@anitacaron can you check in the update queries(src/sparql/update)? We should do that to make it easier to do similar things in the future. Also, I think we should probably split both problems into separate update queries..

I think you don't capture case 1 entirely because you only look at terms that have at least two synonyms?

@anitacaron
Copy link
Collaborator Author

@matentzn I used two queries to make the changes. I put here the two mixed because the difference is a few lines.

src/sparql/update/remove-duplicate-synonyms.ru Outdated Show resolved Hide resolved
src/sparql/update/remove-scoped-synonym-label.ru Outdated Show resolved Hide resolved
@shawntanzk
Copy link
Collaborator

@shawntanzk
Copy link
Collaborator

Notice that this adds an additional has_exact_synoym. Can it instead just remove it but move the dbxref annotation to the exact synonym (not sure how difficult that is)
Screenshot 2022-03-03 at 13 17 13

@matentzn
Copy link
Contributor

matentzn commented Mar 3, 2022

@shawntanzk dont worry about that, we can use normalisation for contracting these duplicate synonyms! No need to change the SPARQL I think.

@matentzn matentzn marked this pull request as ready for review March 3, 2022 20:03
@matentzn
Copy link
Contributor

matentzn commented Mar 3, 2022

Great job! Assuming this is done?

@shawntanzk
Copy link
Collaborator

Assuming this is done?

In the call, there was an agreement that this needs to be reviewed, cant rmbr by who - @dosumis @addiehl? will also look through it tmr :)

@matentzn
Copy link
Contributor

matentzn commented Mar 3, 2022

I think the only thing that can be reviewed is wether the synonyms are all exact. Many are not. But this is not really a hill you want to die on here. It will take dozens of curation hours to fix something that has been like this for 15 years, and no one complained so far. I would not get bogged down by details and just focus on QC and infrastructure so we can actually ensure that stuff that is added moving forward is not broken.

@anitacaron anitacaron self-assigned this Mar 14, 2022
@matentzn
Copy link
Contributor

@shawntanzk can we merge this? Seems a waste to let this open for people to never review this..

@shawntanzk
Copy link
Collaborator

honestly not super sure, would rather bring this up in a call - people had quite strong opinions about this last time it was discussed

@matentzn matentzn added the blocked blocked by another issue label Mar 25, 2022
@matentzn matentzn marked this pull request as draft March 25, 2022 17:40
@matentzn
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked blocked by another issue tech
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Typo/Error] Synonym appears as both exact AND broad or related for the same term
3 participants