Duplicate entry detection needed #32

DoctorBud · 2017-06-05T18:21:35Z

A candidate user of TE for ontology and term editing asked whether duplicate entries are detected and prevented when adding or editing entries via TE. Here is the text of their question, and my response.

@marieALaporte:

When in the process of creating new terms do you check whether the term exists or not in the ontology?
I understand that INCA can be used in 2 cases : the creation of new terms and the creation of DP [Design Pattern?] of an existing term.
How to you make the different and deal with duplication of terms?

@DoctorBud:

These are good questions. Here's where we are now:

The INCA Table Editor (TE) does not itself look for existing terms, nor does it try to prevent duplicate terms.

It seems like a really good feature, though.

In terms of implementation, I see at least 3 places where we could do the checking:

Easiest: During editing/addition of a row, the TE can look for and alert the user to a 'local' duplicate by observing only the rows in the currently edited TSV. The user could be given the option to let the duplicate be entered, or we could just deny them that option.

More Complex: During editing/addition, the TE can look for alert the user to the fact that their 'row' duplicates an existing term in some official 'shared' registry of terms. This is useful if you have users who are editing subsets of the entire ontology, in which case the above 'local' solution is insufficient. There are a few ways to implement this, but they all rely upon TE having access to the 'shared' registry, which can be solved in various ways that make the shared registry available via the network.

Easy, but out of scope for TE: The back-end code that turns TSV+YAML into OWL descriptions prior to merging these models into a shared ontology could (should?) have an additional responsibility to detect and report errors and duplicates, and eliminate duplicates if that is the desired behavior. The downside of relying only on the back-end detection is that the user (author) doesn't realize that they are duplicating terms until long after their editing session.

I'm going to create a GitHub Issue from our conversation here and invite Chris (@cmungall) to comment on it.

cmungall · 2017-06-05T19:16:47Z

This should largely be a server side responsibility for now. The dupes would be found by travis. Later on we can do owlery queries using reasoning to detect equivalence.

The only thing the client should do at the moment is warn if duplicates labels or defined class iris. I think this ticket can be closed and replaced by a more focused one.

marieALaporte · 2017-06-05T20:29:32Z

I agree that it can be a server side thing. But that would be helpful I think that the person adding a duplicated term knows that the term already exists.

DoctorBud added enhancement question labels Jun 5, 2017

DoctorBud mentioned this issue Jul 5, 2017

enforce agreement between existing terms and labels where ever applicable #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate entry detection needed #32

Duplicate entry detection needed #32

DoctorBud commented Jun 5, 2017

cmungall commented Jun 5, 2017

marieALaporte commented Jun 5, 2017

Duplicate entry detection needed #32

Duplicate entry detection needed #32

Comments

DoctorBud commented Jun 5, 2017

cmungall commented Jun 5, 2017

marieALaporte commented Jun 5, 2017