Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return layout subset #122

Open
Remi-Gau opened this issue Feb 4, 2021 · 13 comments · Fixed by #124
Open

Return layout subset #122

Remi-Gau opened this issue Feb 4, 2021 · 13 comments · Fixed by #124

Comments

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Feb 4, 2021

Make query return subset of layout instead of flat list

In current state querry returns flat lists of found data, subjects etc. This may create some difficulties during data processing, for ex. if one need to access to metadata in json file, he need to reparse the filenames.

I propose that querry returns instead the either subset subject field, or full layout containing only selected data files. Retrieve a flat list of files from it is trivial, so it will nod add complexity if one just need data files, but it will provide additional metadata together with data files, if they are needed.

Originally posted by @nbeliy in #60 (comment)

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Feb 4, 2021

In some modalities (meg/eeg. func) images are accompanied by tabular data, like events.tsv. In actual state, one need to extract data file names and then replace the suffix to access these tabular files.

Technically the content of those files is loaded in the layout:

subject.func(end).meta = bids.util.tsvread(fullfile(pth, file_list{i})); % ?

But it seems that we don't have the code in query to access it:

BIDS = bids.layout(fullfile(pth_bids_example, 'ds001'));

bids.query(BIDS, 'metadata', 'sub', '01', 'run', '01', 'type', 'events')

ans = 
  struct with no fields.
  
 % But we get this:

 BIDS.subjects(1).func(4).meta

ans = 

  struct with fields:

                   onset: [158×1 double]
                duration: [158×1 double]
              trial_type: {158×1 cell}
             cash_demean: [158×1 double]
    control_pumps_demean: [158×1 double]
          explode_demean: [158×1 double]
            pumps_demean: [158×1 double]
           response_time: [158×1 double]

(Will add a test for this, because I doubt this is the intended behavior.)

So I would not necessarily think this is needed for events but definitely, we need a way to deal with the intendedFor for fieldmaps, especially that it is being reused for other data types as well (ASL, PET...) .

See #66

@nbeliy
Copy link
Collaborator

nbeliy commented Feb 4, 2021

Technically the content of those files is loaded in the layout:

Yes, I've seen it. But you load them only for some modalities, and parce it. I propose to treat them as independent data files, for example querry for func will return something like:

sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_bold.nii.gz
sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_events.tsv

So if one want to run some scripts on all fMRI, he has the list of both image and events, instead to try to generate filename from image name.

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Feb 4, 2021

In current state querry returns flat lists of found data, subjects etc. This may create some difficulties during data processing, for ex. if one need to access to metadata in json file, he need to reparse the filenames.

I am not sure I follow.

From the test for query:

md = bids.query(BIDS, 'metadata', ...

BIDS = bids.layout(fullfile(pth_bids_example, 'ds007'));

  md = bids.query(BIDS, 'metadata', ...
                  'sub', '05', ...
                  'run', '02', ...
                  'task', 'stopsignalwithmanualresponse', ...
                  'type', 'bold');

  assert(isstruct(md) & isfield(md, 'RepetitionTime') & isfield(md, 'TaskName'));
  assert(md.RepetitionTime == 2);
  assert(strcmp(md.TaskName, 'stop signal with manual response'));

Sure there are still some issues with the getting metadata (#23, #79) and sure some reparsing is happening under the hood when querying for metada, but I am not sure to see where the issue is.

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Feb 4, 2021

Technically the content of those files is loaded in the layout:

Yes, I've seen it. But you load them only for some modalities, and parce it. I propose to treat them as independent data files, for example querry for func will return something like:

sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_bold.nii.gz
sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_events.tsv

So if one want to run some scripts on all fMRI, he has the list of both image and events, instead to try to generate filename from image name.

But you can already do that, no?

BIDS = bids.layout(fullfile(pth_bids_example, 'ds001'));
bids.query(BIDS, 'data', 'sub', '01', 'task', 'balloonanalogrisktask', 'run', '01')

ans =

  2×1 cell array

    '/home/remi/github/BIDS-matlab/tests/bids-examples/ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz'
    '/home/remi/github/BIDS-matlab/tests/bids-examples/ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_events.tsv'

@nbeliy
Copy link
Collaborator

nbeliy commented Feb 4, 2021

Simple example: someone want to process T1w image with his script, which also needs some metadata from json.

In actual state:
He get the list of files using querry, and retrive needed parameters eithr using second querry, or just changing extention and parcing json file in his script.

If query returns list of subject structure:
he loops over these structures wich contains the path to file and the path to json (and probably other useful information).

@nbeliy
Copy link
Collaborator

nbeliy commented Feb 4, 2021

But you can already do that, no?

Probably (I didn't looked to much in query part).
But it works only if you define it for given modality. If someone want to do a task during a structural mri (I don't discuss the usefulness!) the query will ignore the events.tsv, because you impose that tsv files exists only for some modalities.

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Feb 4, 2021

In actual state layout function checks for comliency with current (hardcoded) bids schema.

Technically we are not "yet" following the schema (see #104) but I see what you mean. Official schema support is on its way though see PR #121)

Which will automatically made it unusable if dataset do not follow defined schema for whatever reason. For example if one uses an old bidsified dataset, not compatible with actual state.

Yup we actually already have this problem in the PR #121 as the FLASH suffix is now deprecated and not encoded in the BIDS schema.

So I propose to move the checks for modalities, file extentions, entities to validate function, and make layout working on any dataset which follows bids syntax: sub-<>_(<key>-<value>)*_<suffix>.<extention>.

Actually, we try to leave the extensive validation to the validator but there is a bit of entity list validation at parsing.

It will not only simplify the code (all complex ifs will move to validate), but also make core code independent of version of BIDS, simplifying the maintenance.

Actually using the official schema is already a way to simplify the code (but work in progress). The issue of how to deal with several BIDS schema version though is an important one.

One thing that could be done is to use "tolerant" option of the "layout" to actually allow parsing of dataset with almost no rule on filenaming except the sub-<>_(<key>-<value>)*_<suffix>.<extention> you mentioned.

@nbeliy
Copy link
Collaborator

nbeliy commented Feb 4, 2021

One thing that could be done is to use "tolerant" option of the "layout" to actually allow parsing of dataset with almost no rule on filenaming except the sub-<>(-)*. you mentioned.

Or just load the layout and run validation afterwards. It will significally simplify the loading part, and allow more complex verifications in validator part. For example check for magintude1 if phasediff is present, or presence of specific json fields and values. The main functional code will remain simple not chainging (once debugged), while the validator can increase the complexity folowwing the BIDS requirements. Validator can even automatically launched after the layout.

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Feb 4, 2021

One thing that could be done is to use "tolerant" option of the "layout" to actually allow parsing of dataset with almost no rule on filenaming except the sub-<>(-)*. you mentioned.

Or just load the layout and run validation afterwards. It will significally simplify the loading part, and allow more complex verifications in validator part. For example check for magintude1 if phasediff is present, or presence of specific json fields and values. The main functional code will remain simple not chainging (once debugged), while the validator can increase the complexity folowwing the BIDS requirements. Validator can even automatically launched after the layout.

I am actually reluctant to have layout return a structure that is not about a valid BIDS dataset by default. I would rather have the non-bids-compliance be somethings users have to put extra effort to opt-in.

Having defaults follow "best practice" is actually a good way to nudge good behaviors.

@nbeliy
Copy link
Collaborator

nbeliy commented Feb 4, 2021

I am actually reluctant to have layout return a structure that is not about a valid BIDS dataset by default. I would rather have the non-bids-compliance be somethings users have to put extra effort to opt-in.

Easy:

function result = layout(...)
 <cration of layout in ''tolerant way'>

 if ~result.validate()
    error(...)
end

@nbeliy
Copy link
Collaborator

nbeliy commented Feb 4, 2021

Having defaults follow "best practice" is actually a good way to nudge good behaviors.

Only if the best practices are not restrictive and not changing every month. Integration of new modalities can take years (qMRI and PET).

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Feb 5, 2021

@nbeliy moved your comment to a different issue. :-)

@bids-standard bids-standard deleted a comment from nbeliy Feb 5, 2021
@Remi-Gau Remi-Gau linked a pull request Feb 7, 2021 that will close this issue
19 tasks
@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Feb 7, 2021

The PR #124 will at least start taking care of the first point and allow to parse datasets in a schemaless fashion.

Just need to switch the tolerant parameter of layout to true.

https://github.com/Remi-Gau/bids-matlab/blob/5a6a598d495d2189964b4dfb0f020f75e2d4be58/tests/test_layout_derivatives.m#L37

@Remi-Gau Remi-Gau changed the title Generalize layout / Return layout subset / Associated files Return layout subset May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Wish list
Development

Successfully merging a pull request may close this issue.

2 participants