Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA]: post-#475 - migrate schema access code (Python API) to bids-schema #543

Open
yarikoptic opened this issue Jul 24, 2020 · 13 comments
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.

Comments

@yarikoptic
Copy link
Collaborator

Continuing discussion which started in #475 but should be handled (some time after #475 is merged and a new BIDS spec version released).

My thinking as for moving support code to bids-schema was -- to facilitate access (for code) to previous versions of schema so validators etc could validate against actual schema version, instead of the most recent or whatever single one they have available; and also schema access API/code could be tested against older versions etc. Actually probably not all code should migrate -- schema access code/API -- migrate. Helpers to "render markdown" -- should reside here since that is where they are most pertinent and not needed for 3rd party applications wanting to access schema itself.

While actual "current" schema would reside here (and only archived upon release to bids-schema), released code should not need changes for schema/docs changes in BIDS itself. Any changes to code to render data within bids-specification would then done in this repository.

There will be cases were code changes would be necessary to accommodate new desires in the schema itself! For those developments it should be relatively easy to establish a branch within bids-schema with needed changes in the code, while working also here on actual schema changes. It just would need bids-schema code release before PR here would be merged. A bit cumbersome... but could work.

Having said all that, I think all code should reside here at least for some time so that schema and code "solidify" a bit more...

Alternative: do not move code at all, and rather

  • make bids-specification itself into providing a python package (bids_schema) with the API
  • might require releases of bids_schema python package code without releasing bids-specification itself

although sounds simple(r) I think such a mix of code + specification might eventually bite back painfully.

@tsalo tsalo added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Sep 23, 2020
@tsalo
Copy link
Member

tsalo commented Nov 2, 2020

In #610, the schema code is restructured into a more package-like structure. @sappelhoff brought up the importance of fully converting the code to a package, with a proper setup, unit testing, and linting. The code contains a range of functions, though, and only some of them are generic BIDS code. The rest are specific to the specification rendering. I'm not sure if we want to split up the package, though, so I'd love to get folks thoughts on this.

@yarikoptic
Copy link
Collaborator Author

I think splitting it out should indeed be done, but since schema and the supporting code is still "in flux" not sure if splitting into a separate repository would cause burden which could be avoided while keeping it in here for now. For now it could indeed be worked out into a proper package within this repo for "schema" (but not rendering) support code, to be installed and tested as part of the workflows.

@tsalo
Copy link
Member

tsalo commented Nov 2, 2020

What kind of structure do you think would work best? Something like this?

tools/
    schema/
        # package of schema-related code
    schemarendering/
        # package of rendering functions, with schema package as a dependency

@yarikoptic
Copy link
Collaborator Author

Dear @tsalo . In DANDI archive project we need to add BIDS datasets validation (dandi/dandi-schema#74), and I think we should use the src/schema within BIDS or to say more precisely -- its releases (since datasets could be of older versions) via a set of supporting Python interfaces.
IIRC there was some (more) recent on the potential future of the src/schema and Python interfaces to use it, but I fail to find it... But may be it is time to resurrect bids-schema effort and consider establishing desired "pipeline" for releasing etc.

FWIW, in DANDI we have

I don't want to pollute this issue with ideas, but would be happy to discuss this topic further and summarize "lessons learned" (from dandi) which might help here.

@tsalo
Copy link
Member

tsalo commented Aug 13, 2021

I recently met with @erdalkaraca and Jochem Rieger, who are working on Python tools to interact with the schema, as well as validate and search BIDS datasets using the schema directly. In order to not step on anyone's toes, they're working on that in https://github.com/ANCPLabOldenburg/ancp-bids (see bids-standard/pybids-light#1 as well), but they hope to integrate the new code into pybids directly.

Based on that discussion, it seems like functions for working with the schema would fit best in pybids, while rendering code in the specification repository would use pybids as a dependency, and bids-schema could be used to back up versions of the schema, as we've discussed. That way, if the structure of the schema changes, then we can update old versions in bids-schema without having to try to change specification releases.

Does that fit with what you're proposing, or would you prefer a different strategy?

@yarikoptic
Copy link
Collaborator Author

yarikoptic commented Aug 13, 2021

Great to hear that there is more interest/work ongoing!

they hope to integrate the new code into pybids directly

that is what I am not sure about since pybids itself is a bit too heavy with all the pandas etc depends, that is why I somewhat liked an idea for pybids-light. An alternative would be to look into lightweighting default pybids installation but not sure how good that idea is.
That is why I was thinking about just beefing up the "stock" bids-validator python module which pybids uses first so then the effect would benefit everyone -- anyone interested to just validate BIDS without needing to install heavy pybids, and all pybids users.

But then where code to read the model should reside? could be done in bids-validator, but that would be suboptimal. So should be yet another different module. placing into pybids -- makes impossible to use with bids-validator thus making it obsolete and just getting back to heavy pybids...

So -- not yet sure on the best course of action, besides a dedicated lightweight bids-schema python module or lightweighting pybids (unlikely to happen, right @effigies @tyarkoni ?)

That way, if the structure of the schema changes, then we can update old versions in bids-schema without having to try to change specification releases.

yes, I guess, kinda ;-) I think that bids-schema should contain original releases of the schema as they were released within bids-specification. If there is a structural change, the library which supports reading it should either auto-migrate upon load if so needed, or we should have <version>/<target_version>/ hierarchy, where original one would go under <version>/<version> and any migrated/tuned to the corresponding versioned subfolder.

Funny enough - we are going through pretty much exactly the same discussion in DANDI about its schema migration: auto vs explicit etc. Ref: dandi/dandi-archive#475 . And my stand is obvious -- I would like to keep "original" explicitly present, and if needed to be updated/migrated - explicitly upgraded/migrated but so that original one is still available. Otherwise - e.g. errors in migrations etc would be hard(er) to mediate (since might be buried in the chain of commits/migrations - at least here everything will be under git) etc.

And if the code which is used to read schema resides in the same repo, then we need really just <version>/orig and (optional) <version>/migrated where /migrated would be the one current version of library reads, and if needs to be re-migrated - would be done into the same subfolder. There should also be somewhere indicator to which version of the schema it was migrated as representative of "which structural schema of the schema ;)" it uses.

@yarikoptic
Copy link
Collaborator Author

Great to hear that there is more interest/work ongoing!

they hope to integrate the new code into pybids directly

forgot to mention: from what I have seen -- this sounds more like a birth of an external project, since more time goes would become only more difficult to "merge" it into another (e.g. pybids).

@erdalkaraca
Copy link
Collaborator

Hi Yaroslav,
I think what Taylor means is contributing back to the pybids project. Merging back will not be possible since it is not a fork of pybids and also completely different approaches. We will be experimenting with various technologies to find out the best approach.
For example, I just finished an initial XSD schema (see [1]) taking Taylor's work into consideration (based on YAML). From the XSD schema I generated Python code (using generateDS.py [2]) which is meant to be an API to the schema. That approach allows to use Python's lxml module to query a dataset (in-memory).

[1] https://github.com/ANCPLabOldenburg/ancp-bids/tree/xsd-experiment
[2] generateDS.py: http://www.davekuhlman.org/generateDS.html#how-to-build-and-install-it

Great to hear that there is more interest/work ongoing!

they hope to integrate the new code into pybids directly

forgot to mention: from what I have seen -- this sounds more like a birth of an external project, since more time goes would become only more difficult to "merge" it into another (e.g. pybids).

@TheChymera
Copy link
Collaborator

I'm currently working on setting up a smaller (i.e. both in therms of code and dependencies) regex-based validator.

Given that the schema is being separated, this could be pointed to a schema package version, or extended schemas e.g. for people working on new kinds of data and trying to develop the standard further.

This is the code thus far:
https://github.com/TheChymera/bids-specification/tree/validator-xs

And this should be tracking the diff:
#969

@yarikoptic

@TheChymera
Copy link
Collaborator

@yarikoptic ok, so this basically works as per 1fc9b95 .
This is an example validation report for DANDI:000108: https://ppb.chymera.eu/1d5ef8.log

One last issue I have, is I can't find the representation of *sessions.tsv in the YAML.
I know it's part of BIDS, I put it in my own datasets and it gets auto-created by my Bruker-BIDS parser, but it doesn't seem to be represented, and is only mentioned once in the comments...
Any ideas?

@TheChymera
Copy link
Collaborator

TheChymera commented Jan 18, 2022

@yarikoptic regarding *_sessions.tsv, it seems some work on BIDS will be required there. We might want to figure out the approach as an integral part of BIDS, since unlike our other edits, this does not consist merely of adding entries, we need a new system for intermediate-level files.

@yarikoptic
Copy link
Collaborator Author

I think what Taylor means is contributing back to the pybids project.

that is good (I am all for contributing to existing projects). The main concern (as stated elsewhere, TODO: ref) with that approach is that pybids has heavy dependencies (e.g pandas) which would make it a "questionable" dependency to add to relatively lightweight projects (such as our dandi-cli).

@TheChymera
Copy link
Collaborator

@yarikoptic we have a fix and can now validate sessions files:
TheChymera@a15ac46
TheChymera/bids-schemadata@c8950ef

Though the official solution is not yet finished: #985 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
Development

No branches or pull requests

4 participants