Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

properties for LabProtocol and LabProcess #675

Open
marco-brandizi opened this issue Jun 9, 2024 · 2 comments
Open

properties for LabProtocol and LabProcess #675

marco-brandizi opened this issue Jun 9, 2024 · 2 comments

Comments

@marco-brandizi
Copy link

In the current draft description, LabProtocol has the properties bioSample, sample, computationalTool, labEquipment, reagent. LabProcess doesn't have any of these properties.

As far as I know, this can be a problem when multiple protocol applications (ie, LabProcess instances) are variants of the same protocol/plan (ie, a LabProtocol instance, linked via executesLabProtocol with n-1 cardinality), especially in the case of bioSample. For instance, suppose that essentially the same treatment (protocol) is administered to all the biosamples in two treatment groups, and one group uses a reagent, while the other group uses another (similar examples could be made with two versions of a software product or lab machinery).

Shouldn't these properties be allowed in LabProcess too?

Moreover, in this issue it was proposed to add PropertyValue to the range of such properties. In my view (and from experience with the ISA model) , having them as sub-properties of parameterValue (in the case of LabProcess) and as sub-properties of a new property named like parameter with FormalParameter in the range (in the case of LabProtocol) would simplify searches and use cases, since often things like reagents or software names are searched in the lab parameters.

bioSample would make an exception to this, since to me, it's either an input/output (in the case of LabProtocol) or an object/result (in the case of LabProcess).

In the latter case, schema:Action defines input/output with these terms, but not calling them input/output is very weird to a biologist, so I'd also make input/output sub-properties of object/result and I'd prefer these property names in the context of life science. Or, we could introduce more specific properties, such as labProtocolInput, labProtocolOutput (since schema:input/schema:output are so generic names, and I'm pretty sure sooner or later they will clash with some other desired meaning in the same or another application domain, if it hasn't already happened).

Finally, why does LabProtocol have both the properties bioSample and sample, with the same description and different ranges?

@floWetzels
Copy link
Contributor

Hi Marco, thanks for your input! Here are a few comments on what you wrote. Looking forward to discuss this in our meeting.

In the current draft description, LabProtocol has the properties bioSample, sample, computationalTool, labEquipment, reagent. LabProcess doesn't have any of these properties.

As far as I know, this can be a problem when multiple protocol applications (ie, LabProcess instances) are variants of the same protocol/plan (ie, a LabProtocol instance, linked via executesLabProtocol with n-1 cardinality), especially in the case of bioSample. For instance, suppose that essentially the same treatment (protocol) is administered to all the biosamples in two treatment groups, and one group uses a reagent, while the other group uses another (similar examples could be made with two versions of a software product or lab machinery).

Shouldn't these properties be allowed in LabProcess too?

The current LabProcess type was inspired by the ISA model. There, a protocol has constant components and variable parameters. We intended to use reagents, labEquipment and computationalTool as components (they are part of the LabProtocol type) while the process defines the variable parameters. Multiple processes can implement the same protocol, but only differ in parameters and in the inputs/outputs. Therefore, if components change, it is a different protocol (e.g. in your example, either you're describing two distinct protocols, or the reagents and the software should be described as parameters). Of course, such a restriction is up for discussion, we're only describing how the LabProcess type was designed.

Moreover, in this issue it was proposed to add PropertyValue to the range of such properties. In my view (and from experience with the ISA model) , having them as sub-properties of parameterValue (in the case of LabProcess) and as sub-properties of a new property named like parameter with FormalParameter in the range (in the case of LabProtocol) would simplify searches and use cases, since often things like reagents or software names are searched in the lab parameters.

This indeed fits to what we explained above: annotating reagents and software as parameters should be perfectly fine. We believe the important part is to make a clear distinction between constant and variable properties in the vocabulary, but of course reagents and softwares can be both. As a short remark, we don't think that FormalParameter fits into the range of components and parameters, as it has a completely different semantic interpretation strictly tied to computational workflows.

In the latter case, schema:Action defines input/output with these terms, but not calling them input/output is very weird to a biologist, so I'd also make input/output sub-properties of object/result and I'd prefer these property names in the context of life science. Or, we could introduce more specific properties, such as labProtocolInput, labProtocolOutput (since schema:input/schema:output are so generic names, and I'm pretty sure sooner or later they will clash with some other desired meaning in the same or another application domain, if it hasn't already happened).

As you stated, Labprocess is meant as a type describing the mapping from input and output. For those two terms, we have a perfect semantic mapping to existing ones, namely object and result. Of course, the terms input and output are more commonplace in the life-science community, we use them too. However, we prioritize using those from the existing vocabulary as they can also be understand just from generic knowledge in schema.org.

@marco-brandizi
Copy link
Author

Hi @floWetzels, thank you for your comments and sorry for my late reply.

This indeed fits to what we explained above: annotating reagents and software as parameters should be perfectly fine. We believe the important part is to make a clear distinction between constant and variable properties in the vocabulary, but of course reagents and softwares can be both

I understand this as follow: I could define an RDF resource (URI) that is an instance of BioChemEntity and is pointed both by bioschema:reagent and by bioschema:parameterValue. That would be good and would ensure the distinction between constant and variable components that you mentioned. However, the problem is that right know, parameterValue has PropertyValue only in its range.

A reagent could be made an instance of PV too, but too ugly. An alternative is to model a reagent as a reagent when it's a constant and as a PV (with name and value) when it varies. This is problematic because it introduces two different ways to model the same thing.

So, another option is what I was initially saying: properties like reagent could be used for LabProcess too, they could even be sub-properties of parameter/parameterValue. When they don't vary over the application of a LabProtocol, the corresponding LabProcess instances wouldn't have further values attached (ie, a LabProtocol may have defaults/constant parameters).

As a short remark, we don't think that FormalParameter fits into the range of components and parameters, as it has a completely different semantic interpretation strictly tied to computational workflows.

I see a couple of issues with this (similar ones occur here and there in both schema.org and Bioschemas):

  1. neither the name nor the definition of FormalParameter restrict it to computational workflows, rather both are more generic than that (A FormalParameter is an identified variable used to stand for the actual value(s) that are consumed/produced by a set of steps).
  2. in its current draft definition, LabProtocol can have the input and output properties and both of them have FormalParameter as range. So, all of input, output and FormalParameter seem to be used for more than computational workflows (or inconsistently?).

For those two terms, we have a perfect semantic mapping to existing ones, namely object and result. Of course, the terms input and output are more commonplace in the life-science community, we use them too. However, we prioritize using those from the existing vocabulary as they can also be understand just from generic knowledge in schema.org.

inheriting from the top is the sensible thing to do. In fact, I don't propose alternative properties to object/result, but synonyms or subproperties of them. Moreover, as said above, input/output are properties being proposed for LabProtocol (I didn't notice it the first time), but with yet another meaning. As a data engineer, I find it confusing, an average biologists would find it very confusing :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants