Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create column attribute in schema to store precise column names/prefixes #48

Open
3 of 4 tasks
alyssadai opened this issue May 17, 2023 · 3 comments
Open
3 of 4 tasks
Labels
someday Not a priority right now, but we want to keep this around to think or discuss more.

Comments

@alyssadai
Copy link
Collaborator

alyssadai commented May 17, 2023

A few updates should be made to the bagel schema.

  1. Currently, missing values are defined for pipeline_name and pipeline_version, even though right now a subject-session is always expected to have a value for both of these columns

Currently in bagel_schema.json, the key for a given column is the exact expected column name, e.g.

        "PHASE__": {
            "Description": ...,
            "dtype": "str",
            "IsRequired": false,
            "Range": ["SUCCESS", "FAIL", "INCOMPLETE", "UNAVAILABLE"],
            "IsPrefixedColumn": true

(where "PHASE__" is the expected column prefix). But on mr_proc's end, this means you currently need to manually update pipeline trackers (e.g., fmriprep_tracker.py) when there's a change to the column names in the schema, due to how these names are 'hard-coded' in the tracker_configs variables, e.g.
https://github.com/neurodatascience/mr_proc/blob/43b3874af8e74167fba03e9abb096603264e0494/trackers/fmriprep_tracker.py#L104-L116

To avoid needing to update all the pipeline trackers if the column names change, it could be helpful to just encode the actual column name as another field, and treating the keys essentially 'variable names' instead (which won't change). So, for example, the snippet above would become something like:

        "phase": {
            "Label": "PHASE__",
            "Description": ...,
            "dtype": "str",
            "IsRequired": false,
            "Range": ["SUCCESS", "FAIL", "INCOMPLETE", "UNAVAILABLE"],
            "IsPrefixedColumn": true

(For most of the non-prefixed columns, the key and Label would probably be the same). This way, to fetch the appropriate prefix of "phase" columns for example when constructing the bagel.csv, you could just access the value of something like schema_group["phase"]["ColumnLabel"] (if schema_group is your dict storing the "PIPELINE_STATUS_COLUMNS" group from the schema). This way, even if the ColumnLabel value changes in the schema, when you generate a new bagel.csv, the columns should be updated automatically.

From a quick scan of the mr_proc tracker scripts, it looks like with this change the function tracker.get_pipe_tasks might also need to be updated.

Originally posted by @alyssadai in #44 (comment)

Steps to implement

  • Remove MissingValue attribute from pipeline_name, pipeline_version
  • Add "Label" key to each column in schema
  • Update schema README
  • Update function in utility.py that grabs the names of required columns from the schema to use Label instead of keys
@nikhil153
Copy link
Contributor

@alyssadai, I think it's a good idea for future when / if things scale up :)

@github-actions
Copy link

We want to keep our issues up to date and active. This issue hasn't seen any activity in the last 30 days.
We have applied the stale-issue label to indicate that this issue should be reviewed again and then either prioritized or closed.

@github-actions github-actions bot added the _flag:stale [BOT ONLY] Flag issue that hasn't been updated in a while and needs to be triaged again label Aug 18, 2023
@github-actions
Copy link

We want to keep our issues up to date and active. This issue hasn't seen any activity in the last 30 days.
We have applied the stale-issue label to indicate that this issue should be reviewed again and then either prioritized or closed.

@surchs surchs added someday Not a priority right now, but we want to keep this around to think or discuss more. and removed _flag:stale [BOT ONLY] Flag issue that hasn't been updated in a while and needs to be triaged again labels Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
someday Not a priority right now, but we want to keep this around to think or discuss more.
Projects
Status: No status
Development

No branches or pull requests

3 participants