Skip to content
This repository has been archived by the owner on Oct 15, 2021. It is now read-only.

Duplicate entry detection needed #32

Open
DoctorBud opened this issue Jun 5, 2017 · 2 comments
Open

Duplicate entry detection needed #32

DoctorBud opened this issue Jun 5, 2017 · 2 comments

Comments

@DoctorBud
Copy link
Member

A candidate user of TE for ontology and term editing asked whether duplicate entries are detected and prevented when adding or editing entries via TE. Here is the text of their question, and my response.

@marieALaporte:

When in the process of creating new terms do you check whether the term exists or not in the ontology?
I understand that INCA can be used in 2 cases : the creation of new terms and the creation of DP [Design Pattern?] of an existing term.
How to you make the different and deal with duplication of terms?

@DoctorBud:

These are good questions. Here's where we are now:

  • The INCA Table Editor (TE) does not itself look for existing terms, nor does it try to prevent duplicate terms.
  • It seems like a really good feature, though.
  • In terms of implementation, I see at least 3 places where we could do the checking:
  • Easiest: During editing/addition of a row, the TE can look for and alert the user to a 'local' duplicate by observing only the rows in the currently edited TSV. The user could be given the option to let the duplicate be entered, or we could just deny them that option.
  • More Complex: During editing/addition, the TE can look for alert the user to the fact that their 'row' duplicates an existing term in some official 'shared' registry of terms. This is useful if you have users who are editing subsets of the entire ontology, in which case the above 'local' solution is insufficient. There are a few ways to implement this, but they all rely upon TE having access to the 'shared' registry, which can be solved in various ways that make the shared registry available via the network.
  • Easy, but out of scope for TE: The back-end code that turns TSV+YAML into OWL descriptions prior to merging these models into a shared ontology could (should?) have an additional responsibility to detect and report errors and duplicates, and eliminate duplicates if that is the desired behavior. The downside of relying only on the back-end detection is that the user (author) doesn't realize that they are duplicating terms until long after their editing session.

I'm going to create a GitHub Issue from our conversation here and invite Chris (@cmungall) to comment on it.

@cmungall
Copy link
Contributor

cmungall commented Jun 5, 2017

This should largely be a server side responsibility for now. The dupes would be found by travis. Later on we can do owlery queries using reasoning to detect equivalence.

The only thing the client should do at the moment is warn if duplicates labels or defined class iris. I think this ticket can be closed and replaced by a more focused one.

@marieALaporte
Copy link

I agree that it can be a server side thing. But that would be helpful I think that the person adding a duplicated term knows that the term already exists.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants