Create column attribute in schema to store precise column names/prefixes #48

alyssadai · 2023-05-17T05:05:51Z

A few updates should be made to the bagel schema.

Currently, missing values are defined for pipeline_name and pipeline_version, even though right now a subject-session is always expected to have a value for both of these columns

Currently in bagel_schema.json, the key for a given column is the exact expected column name, e.g.

        "PHASE__": {
            "Description": ...,
            "dtype": "str",
            "IsRequired": false,
            "Range": ["SUCCESS", "FAIL", "INCOMPLETE", "UNAVAILABLE"],
            "IsPrefixedColumn": true

(where "PHASE__" is the expected column prefix). But on mr_proc's end, this means you currently need to manually update pipeline trackers (e.g., fmriprep_tracker.py) when there's a change to the column names in the schema, due to how these names are 'hard-coded' in the tracker_configs variables, e.g.
https://github.com/neurodatascience/mr_proc/blob/43b3874af8e74167fba03e9abb096603264e0494/trackers/fmriprep_tracker.py#L104-L116

To avoid needing to update all the pipeline trackers if the column names change, it could be helpful to just encode the actual column name as another field, and treating the keys essentially 'variable names' instead (which won't change). So, for example, the snippet above would become something like:

        "phase": {
            "Label": "PHASE__",
            "Description": ...,
            "dtype": "str",
            "IsRequired": false,
            "Range": ["SUCCESS", "FAIL", "INCOMPLETE", "UNAVAILABLE"],
            "IsPrefixedColumn": true

(For most of the non-prefixed columns, the key and Label would probably be the same). This way, to fetch the appropriate prefix of "phase" columns for example when constructing the bagel.csv, you could just access the value of something like schema_group["phase"]["ColumnLabel"] (if schema_group is your dict storing the "PIPELINE_STATUS_COLUMNS" group from the schema). This way, even if the ColumnLabel value changes in the schema, when you generate a new bagel.csv, the columns should be updated automatically.

From a quick scan of the mr_proc tracker scripts, it looks like with this change the function tracker.get_pipe_tasks might also need to be updated.

Originally posted by @alyssadai in #44 (comment)

Steps to implement

Remove MissingValue attribute from pipeline_name, pipeline_version
Add "Label" key to each column in schema
Update schema README
Update function in utility.py that grabs the names of required columns from the schema to use Label instead of keys

The text was updated successfully, but these errors were encountered:

nikhil153 · 2023-05-17T18:54:09Z

@alyssadai, I think it's a good idea for future when / if things scale up :)

github-actions · 2023-08-18T01:56:57Z

We want to keep our issues up to date and active. This issue hasn't seen any activity in the last 30 days.
We have applied the stale-issue label to indicate that this issue should be reviewed again and then either prioritized or closed.

github-actions · 2023-09-17T02:01:45Z

We want to keep our issues up to date and active. This issue hasn't seen any activity in the last 30 days.
We have applied the stale-issue label to indicate that this issue should be reviewed again and then either prioritized or closed.

alyssadai mentioned this issue May 17, 2023

Update raw data + pipeline-specific columns in bagel schema #44

Closed

6 tasks

alyssadai mentioned this issue May 18, 2023

Implement case-insensitivity to column names when checking against the bagel schema #25

Closed

2 tasks

alyssadai self-assigned this Jun 14, 2023

github-actions bot added the _flag:stale [BOT ONLY] Flag issue that hasn't been updated in a while and needs to be triaged again label Aug 18, 2023

github-actions bot removed the @ flag:stale label Sep 18, 2023

surchs added someday Not a priority right now, but we want to keep this around to think or discuss more. and removed _flag:stale [BOT ONLY] Flag issue that hasn't been updated in a while and needs to be triaged again labels Dec 5, 2023

surchs unassigned alyssadai Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create column attribute in schema to store precise column names/prefixes #48

Create column attribute in schema to store precise column names/prefixes #48

alyssadai commented May 17, 2023 •

edited

Loading

nikhil153 commented May 17, 2023

github-actions bot commented Aug 18, 2023

github-actions bot commented Sep 17, 2023

Create column attribute in schema to store precise column names/prefixes #48

Create column attribute in schema to store precise column names/prefixes #48

Comments

alyssadai commented May 17, 2023 • edited Loading

Steps to implement

nikhil153 commented May 17, 2023

github-actions bot commented Aug 18, 2023

github-actions bot commented Sep 17, 2023

alyssadai commented May 17, 2023 •

edited

Loading