-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPAD/GPI spec comments #3031
Comments
@vanaukenk Notes from GAF and GPAD diving. headersCounter to my intuition, headers are pretty wild across the board. I'm honestly not sure we should bother with supporting the past here and feel that just setting decent rules now and enforcing them would be a cleaner way forward. Things that would likely be scanned as a "property symbol": All headers: https://gist.github.com/kltm/d28a487b3a009c38dda93293808159a5#file-all-headers-txt Given this, I propose a tight definition for non-comment comment lines in the header:
@balhoff @vanaukenk Any thoughts here? taxonA little easier: if used, only a single '|'; no instances of anything else. |
@kltm |
I have a spec question/concern: But we also have this pattern in the extensions. Does that mean we just have W x E number of internal annotations for each pipe separated element in both withfrom and extensions? With, Ext |
Since the with field represents a single piece of evidence, shouldn't it be considered a single annotation? It's just that the evidence is supported by this or that or this or that rather than this and that and this and that. It's an interesting question on where to make the slices. |
Can we explain more specifically what this means? Do you mean you can only have pipes or commas in a single annotation? If you mix them together, where are the "parentheses" exist? Like, is A,B|C,D|E: |
@dougli1sqrd |
So if we had this completely made up annotation: ace-1 involved_in neurotransmission ace-2, ace-3|ace-3, ace-4 occurs_in(motor neuron), part_of(response to odorant)|occurs_in(body wall muscle cell), part_of(response to touch) It could be split into four annotations as follows: ace-1 involved_in neurotransmission ace-2, ace-3 occurs_in(motor neuron), part_of(response to odorant) ace-1 involved_in neurotransmission ace-3, ace-4 occurs_in(motor neuron), part_of(response to odorant) ace-1 involved_in neurotransmission ace-2, ace-3 occurs_in(body wall muscle cell), part_of(response to touch) ace-1 involved_in neurotransmission ace-3, ace-4 occurs_in(body wall muscle cell), part_of(response to touch) |
I'm now wondering if we should rethink this. Allowing for both pipes and commas seems to open up confusion that might lead to what I saw in our original annotations. If there is a delimiter that represent separate annotations, wouldn't it be better to just create the separate annotations? I think when @dustine32 imports conventional annotations, he parses these complex structures into separate 'annotations' anyways. How will we create files as we move forward from GO-CAM exports? Will we need to devise an algorithm for creating pipe versus comma delimited values in one very complex annotation? |
IIRC, introducing pipes into the With/From field was aimed at time-saving/efficiency wrt curation. I can't otherwise think of a reason pipe-separated With/From entries couldn't be separate annotations. For annotation extensions, though, my recollection was that there was concern that individual annotations should not just differ by extensions, partly because of concern about how analysis tools, most (all?) of which don't take extensions into account, would handle seemingly 'duplicate' annotations. Perhaps this is no longer a real concern and we should change our SOP on this. |
We should re-examine. If I understand correctly when we round trip, they will come out as individual annotations. |
@vanaukenk, I think we are thinking along the same lines, but we need to take a closer look at this. Does it give us an advantage to split out the with/from field? Does it give us an advantage to split out the annotation extensions? For the latter, several pipe-separated annotation extensions could in fact lead to different primary inferred annotations, so there is a distinct advantage. |
I can't think of an advantage to splitting out pipe-separated with/from fields. |
@vanaukenk @ukemi @kltm for GPI columns 2, 3, and 4: The values are generally text. Columns 2 and 3 are specified as "xxxx", and column 4 is "Label" (undefined). Could these all be defined as |
MGI:109346 Gene Symbol Nkx1-1 (Column 2), synonym Nkx-1.1 (column 4) |
@ukemi okay I expected as much. :-) |
@balhoff This is some of where the "ASCII, except for non-printing characters, tabs, new lines, control characters, quotation marks, or pipe" comes from (and we even said that quotes are probably fine too on a call). Doesn't this also bump into the utf-8 question? The future may indeed look different: geneontology/helpdesk#265 |
In the GPAD 2 spec I noticed that the symbol |
You are absolutely right about this, and we are working to move whatever relations we can from gorel to RO. That said, there are some gorel relations, like 'regulates o occurs in', that will have to stick around for the foreseeable future until we can move all of the annotations that use it into GO-CAM models. I'm open to suggestions/thoughts on how to make any of this easier. |
Maybe a topic for the software meeting? |
Hi @ukemi, @kltm, @vanaukenk, etc! Taking a look at the GPI 2.0 spec file, I have some more questions actually. In the spec file, we have a grammar for each entity line here: https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md#gp-entities
But in the table below, under https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md#gpi-columns it appears that there are 11 possible columns. Should the above In the actual column specifications, for columns 3 and 5, the cardinality is |
I think the line below should actually say Col_11.
Some of the specs got copied over from a previous iteration and that update got missed. For columns 3 and 5, I think we'd follow a similar use of '|' as a separator. I can update the spec if what I've said here makes sense to you. |
@vanaukenk Okay great! Thanks for the clarification. That makes sense to me. |
Updated specification. Thanks @dougli1sqrd |
Thanks! Sorry, just noting that column 5 I think now reads like it can accept an empty field: |
Thanks @dougli1sqrd You're right; I'll fix that. |
The time for commenting is over. If issues need to be discussed, please open a new ticket. |
I've been working on a parser and object model for the new GPAD/GPI specs, and just wanted to report a few areas in the spec that could be clarified:
GPI_Header ::= '!gpi-version: 1.1' nl
should be version 2.0GPAD_Doc ::= GPAD_Header Annotation*
but no correspondingGPI_Doc
productionGPAD_Header_Line
:'!' Property_Symbol ':' Space* Value nl
Property_Symbol ::= ID | Alphanumeric
, sinceID
must contain a:
, how does a colon in the ID interact with the:
separating the property from the value?Value
doesn't seem to be defined. Should it follow theProperty_Value
defined for property-value pairs within annotation lines? If so it seems it can also contain a:
.Label
is not defined:DB_Object_Synonyms ::= [Label] ('|' Label)*
Taxon_ID
is not defined:DB_Object_Taxon ::= NCBITaxon:[Taxon_ID]
. Should it be one or more digits?Property_Value
could be more explicit. Current:Property_Value ::= (AnyChar - ('=' | '|' | nl))
ASCII, except for non-printing characters, tabs, new lines, control characters, quotation marks, or pipe
=
and|
andnl
are in the forbidden list, but not the other characters mentionedRelational_Expression ::= Relation_ID '(' ID ')'
should the term be an OBO_ID (or I guess staying with ID allows relations to gene product identifiers)?Interacting_taxon_ID
is said to be cardinality 0 or greater, but this isn't in the grammar:Interacting_taxon_ID ::= NCBITaxon:[Taxon_ID]
I wonder if it would be possible to consolidate the grammar definitions a little, as in the SPARQL grammar or in the OWL Functional Syntax grammar. I really enjoy reading those. :-)
The text was updated successfully, but these errors were encountered: