Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BIDS schema to "drive" pybids #818

Open
yarikoptic opened this issue Feb 17, 2022 · 7 comments
Open

Use BIDS schema to "drive" pybids #818

yarikoptic opened this issue Feb 17, 2022 · 7 comments

Comments

@yarikoptic
Copy link
Collaborator

#807 inspired me to search here to find no issue which would suggest to incorporate reading/using BIDS schema to avoid hardcoding bids assumptions in python code within pybids.

My current greedy desire is to avoid reinventing the wheel of parsing/changing BIDS filenames (fresh hits nipy/heudiconv#544, https://github.com/physiopy/phys2bids/pull/374/files) so they remain BIDS-compliant. The only 'scaleable' etc way I see is to use official BIDS schema, which is actively developed (effort lead by @tsalo), and now released as part of the 1.7.0 BIDS release.

I do appreciate the fact that the "schema" of the BIDS schema is itself influx and could change while breaking the code to read it (submitted bids-standard/bids-specification#1013 to at least possibly reduce the "surprise"). But I think it would be valuable to start work in pybids to use "official" schema, even if not dataset_description.json BIDSVersion specific one, but some "trusted/tested" version of it (1.7.0 is a good one -- has microscopy for #807 ;))

I have just populated https://github.com/bids-standard/bids-schema with a recent release (1.7.0) of BIDS schema, so it could be either included as a submodule to pybids or just used as an external resource to download etc to get access to multiple BIDS versions.

WDYT?

@adelavega
Copy link
Collaborator

This is definitely on our mental road map, so thanks for formalizing it.

What I wonder is if this will primary replace this internal bids.layout config:

i.e: https://github.com/bids-standard/pybids/blob/master/bids/layout/config/bids.json

or if the changes will require a more substantial refactor of pybids.

@effigies
Copy link
Collaborator

effigies commented Mar 1, 2022

I think in practice it will mostly replace the <config>.json files. I'm not sure what else we do that can really be schema-driven. Obviously we could have a Python implementation of the validator, but that would just be a new module.

@tsalo
Copy link
Member

tsalo commented Mar 1, 2022

I'd love to have the ancp-bids devs present their approach if and when they're able to reproduce all of BIDSLayout's functionality. Their code directly uses the schema, I believe.

@adelavega
Copy link
Collaborator

@tsalo are there aspects of the BIDS Schema that beyind the scope of the current config files?

That is, in your view have we hard coded other assumptions into Python code?

If not, then I'm pretty hesitant to refactor pyBIDS prematurely.

@tsalo
Copy link
Member

tsalo commented Mar 22, 2022

@adelavega sorry for the delay in responding.

are there aspects of the BIDS Schema that beyind the scope of the current config files?

There are definitely plans to extend the schema in a way that would go beyond the config files (e..g, bids-standard/bids-specification#1029), but that's primarily for validation.

That is, in your view have we hard coded other assumptions into Python code?

I am not familiar enough with most of the pybids code to have an informed opinion on this, but my understanding is that the ancp-bids approach would mostly be beneficial in three ways:

  1. Much faster on large datasets.
  2. No need to SQL, which I believe has been a bottleneck for contributors.
    • I haven't done much with BIDSLayout, but my understanding was that Tal was the main driver behind the SQL stuff, and now that he's moved away from pybids that element is harder to maintain.
    • To be fair, I don't know if the ancp-bids approach is any easier to learn, but I guess that's something they could present on.
  3. Automatic ingestion and use of the schema, so hopefully there wouldn't be any need to regularly update the config files.

@erdalkaraca
Copy link

erdalkaraca commented Apr 1, 2022

As for ancpBIDS:

  • instead of SQL, we have a custom query language without any external dependencies
  • the data structure used to query against is an in-memory graph representation of the BIDS dataset
  • the schema definitions are used to generate code that can assist in developing pipelines using professional IDEs (like PyCharm or VS Code) AND to drive the validation
  • most concepts are implemented as plug-ins
  • each BIDS schema version is represented in its own Python module which allows to dynamically choose the schema according to the BIDSVersion field in the dataset (at the moment, 1.7.0 and 1.7.1 are available)
  • most important aspects of BIDSLayout are covered, no modality specific functions to keep the API clean

Let me know if you would like a technical introduction.

Additional note:
There is an exemplary analysis pipeline which demonstrates how ancpBIDS can be used:
https://github.com/ANCPLabOldenburg/ancp-bids-apps/blob/main/ancpbidsapps/analysis/nilearn_first_level.py

I recently had a discussion with the nilearn devs on discord to consider using ancpBIDS as their BIDS support library. Waiting for further details from nilearn devs.

@adelavega
Copy link
Collaborator

Hi @erdalkaraca thank you for the summary. I have to take a deeper dive myself but what you've done in certainly impressive and seems to have some major benefits, especially with respect to performance and extensibility (i.e. bids schema). Nicely done!

I don't want to derail this issue, so I made a new one: #831

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants