Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCHEMA: Proposals to standardize schema/objects/*.yaml formats #1088

Closed
rwblair opened this issue Apr 21, 2022 · 9 comments · Fixed by #1107
Closed

SCHEMA: Proposals to standardize schema/objects/*.yaml formats #1088

rwblair opened this issue Apr 21, 2022 · 9 comments · Fixed by #1107
Labels
opinions wanted Please read and offer your opinion on this matter schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release.

Comments

@rwblair
Copy link
Member

rwblair commented Apr 21, 2022

From conversations in #1086 the key values and name fields are not used entirely consistently in the schema objects.
columns.yaml uses the name field value as the actual header column name we look for in tsv headers.
entities.yaml uses the entity field value as the literal entity that we look for in file names
suffixes.yaml uses the entries key as the literal suffix that we look for in file names.

Except for columns.yaml name is used as pretty printable/user friendly display name for the object.

The key used for an object is always how we refer to an object within the specification.

suffix and columns should have the values that we look for in datasets moved from their respective places into a new field.

Proposal 1 is to follow entity suite and have a special key-name unique to that object type. So from columns.yaml this:

acq_time__scans:
  name: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.

Becomes:

acq_time__scans:
  name: Acquisition Time
  column: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.

Proposal 2 would be to standardize on a single key name across object to represent in dataset values. If we followed columns suite and used name for this, pretty print values could shift to display_name:

# suffix
TwoPE:  # some unambiguous reference that works for MATLAB and ancpBIDS
  name: 2PE  # The actual value used in the specification
  display_name: 2-photon excitation microscopy  # The pretty name
  description: |
    2-photon excitation microscopy imaging data

# entity
inversion:  # also used for variable names in pybids
  name: inv
  display_name: Inversion Time
  description: |
    If files belonging to an entity-linked file collection are acquired at different
    inversion times, the `_inv-<index>` key/value pair MUST be used to distinguish
    individual files.
    This entity represents the `"InversionTime` metadata field. Please note that the `<index>`
    denotes the number/index (in the form of a nonnegative integer), not the `"InversionTime"`
    value which needs to be stored in the field `"InversionTime"` of the separate JSON file.
  type: string
  format: index

# column
type__electrodes:
  name: type
  display_name: Electrode Type
  description: |
    Type of the electrode (for example, cup, ring, clip-on, wire, needle).
  type: string

Instead of name there may be some other more appropriate key name to use, some floated during the 21.04.2022 schema meeting were alias, value, shorthand, etc but none felt quite right.

I would like to solicit input on the proposals and other potential solutions.

@rwblair rwblair added opinions wanted Please read and offer your opinion on this matter schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release. labels Apr 21, 2022
@rwblair rwblair changed the title Proposals to standardize schema/objects/*.yaml formats SCHEMA: Proposals to standardize schema/objects/*.yaml formats Apr 21, 2022
@rwblair
Copy link
Member Author

rwblair commented Apr 21, 2022

@bids-standard/schema-users

@tsalo
Copy link
Member

tsalo commented Apr 21, 2022

Also want to throw in an extension example:

objects:

any:
  name: ".*"
  display_name: Any Extension
  description: |
    Any extension is allowed

none:
  name: ""
  display_name: No Extension
  description: |
    A file with no extension.

rules/datatypes/:

- suffixes:
  - headshape
  extensions:
  - any
  - none
  entities:
    subject: required
    session: optional
    acquisition: optional

@effigies
Copy link
Collaborator

effigies commented Apr 21, 2022

I think there was one additional proposal, which is to distinguish datatype, suffix and extension as special kinds, having a "value" rather than a name:

# Suffix
TwoPE:  # some unambiguous reference that works for MATLAB and ancpBIDS
  value: 2PE  # The actual value used in the specification
  display_name: 2-photon excitation microscopy  # The pretty name
  description: |
    2-photon excitation microscopy imaging data

# Datatype
anatomical:
  value: anat
  display_name: Anatomical Magnetic Resonance Imaging
  description: |
    Magnetic resonance imaging sequences designed to characterize static, anatomical features.

# Extension
bdf:
  value: ".bdf"
  display_name: Biosemi Data Format
  description: |
    A [Biosemi](https://www.biosemi.com/) Data Format file.

    Each recording consists of a single `.bdf` file.
    [`bdf+`](https://www.teuniz.net/edfbrowser/bdfplus%20format%20description.html) files are permitted.
    The capital `.BDF` extension MUST NOT be used.

Meanwhile entities, columns and metadata could follow the convention 2:

# entity
inversion:  # also used for variable names in pybids
  name: inv
  display_name: Inversion Time
  description: |
    If files belonging to an entity-linked file collection are acquired at different
    inversion times, the `_inv-<index>` key/value pair MUST be used to distinguish
    individual files.
    This entity represents the `"InversionTime` metadata field. Please note that the `<index>`
    denotes the number/index (in the form of a nonnegative integer), not the `"InversionTime"`
    value which needs to be stored in the field `"InversionTime"` of the separate JSON file.
  type: string
  format: index

# column
type__electrodes:
  name: type
  display_name: Electrode Type
  description: |
    Type of the electrode (for example, cup, ring, clip-on, wire, needle).
  type: string

# Metadata
AcquisitionDuration:
  name: AcquisitionDuration
  display_name: Acquisition Duration
  description: |
    Duration (in seconds) of volume acquisition.
    Corresponds to DICOM Tag 0018, 9073 `Acquisition Duration`.
    This field is mutually exclusive with `"RepetitionTime"`.
  type: number
  exclusiveMinimum: 0
  unit: s

That said, metadata is also a bit of its own thing as well, since they are all valid identifiers and we never reference them except as literals in the spec, "<key>" == <key>.name == <key>.display_name. So maybe we should not try to uniformize it in this process. Edit: After discussion with @tsalo, there could be times when we want a unique metadata key, and a display name might be useful for other renderers than the spec.

@TheChymera
Copy link
Collaborator

This looks good, and making sure everything in rules has an object definition reduces the potential for confusion — at the cost of requiring more lookup to contribute (i.e. if you want to add a new extension you now need to edit two files, at least). However I'm not sure whether for the special case of extension it really makes sense. Do we really need to ship documentation for what .json means? We don't define any of these formats here, we don't want to make them diverge from their base standard, and we don't want to have to keep track here if things in the standard change.

The tricky parts about the extension were None, /, and .*. None simply means false (which is a boolean value which programming languages automatically understand), / and its combination with other extensions (like .ngff/) are not really made easier by this, and .* will still need to be regexified explicitly.

Not opposed to this necessarily, just wanted to make sure I'm not missing anything or that we're not adding too much unnecessary lookup steps to the schema....

@erdalkaraca
Copy link
Collaborator

I would go for proposal 2: the term 'column' may lead to confusion without any further context.

From conversations in #1086 the key values and name fields are not used entirely consistently in the schema objects. columns.yaml uses the name field value as the actual header column name we look for in tsv headers. entities.yaml uses the entity field value as the literal entity that we look for in file names suffixes.yaml uses the entries key as the literal suffix that we look for in file names.

Except for columns.yaml name is used as pretty printable/user friendly display name for the object.

The key used for an object is always how we refer to an object within the specification.

suffix and columns should have the values that we look for in datasets moved from their respective places into a new field.

Proposal 1 is to follow entity suite and have a special key-name unique to that object type. So from columns.yaml this:

acq_time__scans:
  name: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.

Becomes:

acq_time__scans:
  name: Acquisition Time
  column: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.

Proposal 2 would be to standardize on a single key name across object to represent in dataset values. If we followed columns suite and used name for this, pretty print values could shift to display_name:

# suffix
TwoPE:  # some unambiguous reference that works for MATLAB and ancpBIDS
  name: 2PE  # The actual value used in the specification
  display_name: 2-photon excitation microscopy  # The pretty name
  description: |
    2-photon excitation microscopy imaging data

# entity
inversion:  # also used for variable names in pybids
  name: inv
  display_name: Inversion Time
  description: |
    If files belonging to an entity-linked file collection are acquired at different
    inversion times, the `_inv-<index>` key/value pair MUST be used to distinguish
    individual files.
    This entity represents the `"InversionTime` metadata field. Please note that the `<index>`
    denotes the number/index (in the form of a nonnegative integer), not the `"InversionTime"`
    value which needs to be stored in the field `"InversionTime"` of the separate JSON file.
  type: string
  format: index

# column
type__electrodes:
  name: type
  display_name: Electrode Type
  description: |
    Type of the electrode (for example, cup, ring, clip-on, wire, needle).
  type: string

Instead of name there may be some other more appropriate key name to use, some floated during the 21.04.2022 schema meeting were alias, value, shorthand, etc but none felt quite right.

I would like to solicit input on the proposals and other potential solutions.

@effigies
Copy link
Collaborator

My two-type proposal (#1088 (comment)) was accepted at today's meeting. A big PR that will be Schema-version-bumping will be needed soon.

@soichih
Copy link

soichih commented May 12, 2022

Will the update be made to 1.7.1? Do you know when 1.7.1 gets released?

@effigies
Copy link
Collaborator

I expect to do it before the next release, since it's a pretty constrained task, but @sappelhoff or @franklin-feingold would know better when to expect another release. I've been out of the loop for regular spec maintenance in the last month or so.

@franklin-feingold
Copy link
Collaborator

I don't have a release in mind until another BEP is incorporated into the specification - unless folks feel a patch bump is needed before then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opinions wanted Please read and offer your opinion on this matter schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants