Return layout subset #122

Remi-Gau · 2021-02-04T14:17:37Z

Make query return subset of layout instead of flat list

In current state querry returns flat lists of found data, subjects etc. This may create some difficulties during data processing, for ex. if one need to access to metadata in json file, he need to reparse the filenames.

I propose that querry returns instead the either subset subject field, or full layout containing only selected data files. Retrieve a flat list of files from it is trivial, so it will nod add complexity if one just need data files, but it will provide additional metadata together with data files, if they are needed.

Originally posted by @nbeliy in #60 (comment)

The text was updated successfully, but these errors were encountered:

Remi-Gau · 2021-02-04T14:26:44Z

In some modalities (meg/eeg. func) images are accompanied by tabular data, like events.tsv. In actual state, one need to extract data file names and then replace the suffix to access these tabular files.

Technically the content of those files is loaded in the layout:

bids-matlab/+bids/layout.m

Line 250 in 6c1dac6

subject.func(end).meta = bids.util.tsvread(fullfile(pth, file_list{i})); % ?

But it seems that we don't have the code in query to access it:

BIDS = bids.layout(fullfile(pth_bids_example, 'ds001'));

bids.query(BIDS, 'metadata', 'sub', '01', 'run', '01', 'type', 'events')

ans = 
  struct with no fields.
  
 % But we get this:

 BIDS.subjects(1).func(4).meta

ans = 

  struct with fields:

                   onset: [158×1 double]
                duration: [158×1 double]
              trial_type: {158×1 cell}
             cash_demean: [158×1 double]
    control_pumps_demean: [158×1 double]
          explode_demean: [158×1 double]
            pumps_demean: [158×1 double]
           response_time: [158×1 double]

(Will add a test for this, because I doubt this is the intended behavior.)

So I would not necessarily think this is needed for events but definitely, we need a way to deal with the intendedFor for fieldmaps, especially that it is being reused for other data types as well (ASL, PET...) .

See #66

nbeliy · 2021-02-04T14:35:00Z

Technically the content of those files is loaded in the layout:

Yes, I've seen it. But you load them only for some modalities, and parce it. I propose to treat them as independent data files, for example querry for func will return something like:

sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_bold.nii.gz
sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_events.tsv

So if one want to run some scripts on all fMRI, he has the list of both image and events, instead to try to generate filename from image name.

Remi-Gau · 2021-02-04T14:44:21Z

In current state querry returns flat lists of found data, subjects etc. This may create some difficulties during data processing, for ex. if one need to access to metadata in json file, he need to reparse the filenames.

I am not sure I follow.

From the test for query:

bids-matlab/tests/test_bids_query.m

Line 63 in 6c1dac6

md = bids.query(BIDS, 'metadata', ...

BIDS = bids.layout(fullfile(pth_bids_example, 'ds007'));

  md = bids.query(BIDS, 'metadata', ...
                  'sub', '05', ...
                  'run', '02', ...
                  'task', 'stopsignalwithmanualresponse', ...
                  'type', 'bold');

  assert(isstruct(md) & isfield(md, 'RepetitionTime') & isfield(md, 'TaskName'));
  assert(md.RepetitionTime == 2);
  assert(strcmp(md.TaskName, 'stop signal with manual response'));

Sure there are still some issues with the getting metadata (#23, #79) and sure some reparsing is happening under the hood when querying for metada, but I am not sure to see where the issue is.

Remi-Gau · 2021-02-04T14:51:01Z

Technically the content of those files is loaded in the layout:

Yes, I've seen it. But you load them only for some modalities, and parce it. I propose to treat them as independent data files, for example querry for func will return something like:
sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_bold.nii.gz
sub-xxx/ses-yyy/sub-xxx_ses-yyy_task-zzz_events.tsv
So if one want to run some scripts on all fMRI, he has the list of both image and events, instead to try to generate filename from image name.

But you can already do that, no?

BIDS = bids.layout(fullfile(pth_bids_example, 'ds001'));
bids.query(BIDS, 'data', 'sub', '01', 'task', 'balloonanalogrisktask', 'run', '01')

ans =

  2×1 cell array

    '/home/remi/github/BIDS-matlab/tests/bids-examples/ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz'
    '/home/remi/github/BIDS-matlab/tests/bids-examples/ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_events.tsv'

nbeliy · 2021-02-04T14:51:06Z

Simple example: someone want to process T1w image with his script, which also needs some metadata from json.

In actual state:
He get the list of files using querry, and retrive needed parameters eithr using second querry, or just changing extention and parcing json file in his script.

If query returns list of subject structure:
he loops over these structures wich contains the path to file and the path to json (and probably other useful information).

nbeliy · 2021-02-04T14:56:00Z

But you can already do that, no?

Probably (I didn't looked to much in query part).
But it works only if you define it for given modality. If someone want to do a task during a structural mri (I don't discuss the usefulness!) the query will ignore the events.tsv, because you impose that tsv files exists only for some modalities.

Remi-Gau · 2021-02-04T15:20:31Z

In actual state layout function checks for comliency with current (hardcoded) bids schema.

Technically we are not "yet" following the schema (see #104) but I see what you mean. Official schema support is on its way though see PR #121)

Which will automatically made it unusable if dataset do not follow defined schema for whatever reason. For example if one uses an old bidsified dataset, not compatible with actual state.

Yup we actually already have this problem in the PR #121 as the FLASH suffix is now deprecated and not encoded in the BIDS schema.

So I propose to move the checks for modalities, file extentions, entities to validate function, and make layout working on any dataset which follows bids syntax: sub-<>_(<key>-<value>)*_<suffix>.<extention>.

Actually, we try to leave the extensive validation to the validator but there is a bit of entity list validation at parsing.

It will not only simplify the code (all complex ifs will move to validate), but also make core code independent of version of BIDS, simplifying the maintenance.

Actually using the official schema is already a way to simplify the code (but work in progress). The issue of how to deal with several BIDS schema version though is an important one.

One thing that could be done is to use "tolerant" option of the "layout" to actually allow parsing of dataset with almost no rule on filenaming except the sub-<>_(<key>-<value>)*_<suffix>.<extention> you mentioned.

nbeliy · 2021-02-04T15:35:48Z

One thing that could be done is to use "tolerant" option of the "layout" to actually allow parsing of dataset with almost no rule on filenaming except the sub-<>(-)*. you mentioned.

Or just load the layout and run validation afterwards. It will significally simplify the loading part, and allow more complex verifications in validator part. For example check for magintude1 if phasediff is present, or presence of specific json fields and values. The main functional code will remain simple not chainging (once debugged), while the validator can increase the complexity folowwing the BIDS requirements. Validator can even automatically launched after the layout.

Remi-Gau · 2021-02-04T15:44:43Z

One thing that could be done is to use "tolerant" option of the "layout" to actually allow parsing of dataset with almost no rule on filenaming except the sub-<>(-)*. you mentioned.

Or just load the layout and run validation afterwards. It will significally simplify the loading part, and allow more complex verifications in validator part. For example check for magintude1 if phasediff is present, or presence of specific json fields and values. The main functional code will remain simple not chainging (once debugged), while the validator can increase the complexity folowwing the BIDS requirements. Validator can even automatically launched after the layout.

I am actually reluctant to have layout return a structure that is not about a valid BIDS dataset by default. I would rather have the non-bids-compliance be somethings users have to put extra effort to opt-in.

Having defaults follow "best practice" is actually a good way to nudge good behaviors.

nbeliy · 2021-02-04T15:51:25Z

I am actually reluctant to have layout return a structure that is not about a valid BIDS dataset by default. I would rather have the non-bids-compliance be somethings users have to put extra effort to opt-in.

Easy:

function result = layout(...)
 <cration of layout in ''tolerant way'>

 if ~result.validate()
    error(...)
end

nbeliy · 2021-02-04T15:58:49Z

Having defaults follow "best practice" is actually a good way to nudge good behaviors.

Only if the best practices are not restrictive and not changing every month. Integration of new modalities can take years (qMRI and PET).

Remi-Gau · 2021-02-05T14:39:41Z

@nbeliy moved your comment to a different issue. :-)

Remi-Gau · 2021-02-07T22:21:10Z

The PR #124 will at least start taking care of the first point and allow to parse datasets in a schemaless fashion.

Just need to switch the tolerant parameter of layout to true.

https://github.com/Remi-Gau/bids-matlab/blob/5a6a598d495d2189964b4dfb0f020f75e2d4be58/tests/test_layout_derivatives.m#L37

Remi-Gau mentioned this issue Feb 4, 2021

[ENH] query returns events file metadata and content #123

Closed

Remi-Gau mentioned this issue Feb 5, 2021

Dealing with intendedFor and "file dependencies" #130

Closed

bids-standard deleted a comment from nbeliy Feb 5, 2021

Remi-Gau linked a pull request Feb 7, 2021 that will close this issue

[ENH] implement bids-schema - part 1 #124

Merged

19 tasks

Remi-Gau changed the title ~~Generalize layout / Return layout subset / Associated files~~ Return layout subset May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return layout subset #122

Return layout subset #122

Remi-Gau commented Feb 4, 2021 •

edited

Loading

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021 •

edited

Loading

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 5, 2021

Remi-Gau commented Feb 7, 2021

Return layout subset #122

Return layout subset #122

Comments

Remi-Gau commented Feb 4, 2021 • edited Loading

Make query return subset of layout instead of flat list

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 4, 2021

nbeliy commented Feb 4, 2021 • edited Loading

nbeliy commented Feb 4, 2021

Remi-Gau commented Feb 5, 2021

Remi-Gau commented Feb 7, 2021

Remi-Gau commented Feb 4, 2021 •

edited

Loading

nbeliy commented Feb 4, 2021 •

edited

Loading