Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRD Data opened in Python appears corrupted #214

Closed
1 of 2 tasks
gabuzi opened this issue Nov 15, 2023 · 18 comments
Closed
1 of 2 tasks

MRD Data opened in Python appears corrupted #214

gabuzi opened this issue Nov 15, 2023 · 18 comments
Assignees

Comments

@gabuzi
Copy link
Contributor

gabuzi commented Nov 15, 2023

Hi all!

I've just taken KomaMRI.jl for a quick test-drive and I noticed that I can't open the resulting ismrmrd files with the ismrmrd-python package. Reason seems to be a second nested level of the <UserParameters> element (see below). I also spotted a potential problem with the gpu user param value (annotated in the snippet below).

I used v0.7.6 with julia 1.9.1

<?xml version="1.0" encoding="utf-8"?>
<ismrmrdHeader xsi:schemaLocation="http://www.ismrm.org/ISMRMRD ismrmrd.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.ismrm.org/ISMRMRD" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 ...
<userParameters>
    <userParameters> <!-- this one here! -->
      <userParameterLong>
        <name>Nblocks</name>
        <value>2195</value>
      </userParameterLong>
      <userParameterLong>
        <name>gpu</name>
        <value>false</value> <!-- this may be another issue as value 'false' is not a valid xs:long value -->
      </userParameterLong>
      <userParameterLong>
        <name>gpu_device</name>
        <value>0</value>
      </userParameterLong>
      <userParameterString>
        <name>precision</name>
        <value>f32</value>
      </userParameterString>
      <userParameterString>
        <name>return_type</name>
        <value>raw</value>
      </userParameterString>
      <userParameterDouble>
        <name>Δt_rf</name>
        <value>5.0e-5</value>
      </userParameterDouble>
      <userParameterLong>
        <name>Nthreads</name>
        <value>1</value>
      </userParameterLong>
      <userParameterString>
        <name>sim_method</name>
        <value>Bloch()</value>
      </userParameterString>
      <userParameterDouble>
        <name>sim_time_sec</name>
        <value>15.131958583</value>
      </userParameterDouble>
      <userParameterDouble>
        <name>Δt</name>
        <value>0.001</value>
      </userParameterDouble>
    </userParameters> <!-- this one here! -->
  </userParameters>
</ismrmrdHeader>

EDIT 1-Dec-2023 (@cncastillo):

Hi I think this issue can be separated in two:

@gabuzi
Copy link
Contributor Author

gabuzi commented Nov 15, 2023

Sorry, I believe I have confused some versions. The UI reported 0.7.6, which I believe is just for the core. I installed the latest release of the package, which overall is v0.7.5, I think (but contains core v0.7.6).

@gabuzi
Copy link
Contributor Author

gabuzi commented Nov 15, 2023

There seems to be more beyond XML:
Individual acquisition headers as opened in Python show the output below.

Unexpectedly, the kspace_encoding_counter_1 ends up in the phase field, and the scan_counter field is always 0. According to comments in https://github.com/cncastillo/KomaMRI.jl/blob/v0.7.5/KomaMRICore/src/io/ISMRMRD.jl, that should not be the case. I may have checked the wrong file, though.

Also some other fields have unexpected values (e.g. flags, number_of_samples,...)

acq_hdr1

version: 0
flags: 170718
measurement_uid: 0
scan_counter: 0
acquisition_time_stamp: 0
physiology_time_stamp: 389939200, 0, 0
number_of_samples: 0
available_channels: 0
active_channels: 0
channel_mask: 4575657225703456832, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
discard_pre: 0
discard_post: 0
center_sample: 0
encoding_space_ref: 0
trajectory_dimensions: 0
sample_time_us: 0.0
position: 1.8367099231598242e-40, 4.713547644449387e-41, 2.3956598546097073e-41
read_dir: 0.0, 0.0, 0.0
phase_dir: 2.2779507836064226e-41, 0.0, 0.0
slice_dir: 0.0, 2.2779507836064226e-41, 0.0
patient_table_position: 0.0, 0.0, 2.2779507836064226e-41
idx: kspace_encode_step_1: 0
kspace_encode_step_2: 0
average: 0
slice: 0
contrast: 0
phase: 0
repetition: 0
set: 0
segment: 0
user: 0, 0, 0, 0, 0, 0, 0, 0

user_int: 0, 0, 0, 0, 0, 0, 0, 0
user_float: 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

acq_hdr2
version: 0
flags: 170718
measurement_uid: 0
scan_counter: 0
acquisition_time_stamp: 65536
physiology_time_stamp: 3266969600, 0, 0
number_of_samples: 0
available_channels: 0
active_channels: 0
channel_mask: 4575657225703456832, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
discard_pre: 0
discard_post: 0
center_sample: 0
encoding_space_ref: 0
trajectory_dimensions: 0
sample_time_us: 0.0
position: 1.8367099231598242e-40, 4.713547644449387e-41, 2.3956598546097073e-41
read_dir: 0.0, 0.0, 0.0
phase_dir: 2.2779507836064226e-41, 0.0, 0.0
slice_dir: 0.0, 2.2779507836064226e-41, 0.0
patient_table_position: 0.0, 0.0, 2.2779507836064226e-41
idx: kspace_encode_step_1: 0
kspace_encode_step_2: 0
average: 0
slice: 0
contrast: 0
phase: 1
repetition: 0
set: 0
segment: 0
user: 0, 0, 0, 0, 0, 0, 0, 0

user_int: 0, 0, 0, 0, 0, 0, 0, 0
user_float: 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

@gabuzi gabuzi changed the title Malformed XML header in ISMRMRD Simulation Data MRD Data opened in Python appears corrupted Nov 15, 2023
@cncastillo
Copy link
Member

cncastillo commented Nov 15, 2023

Hi @gabuzi, thanks for noticing this. I was aware that there could be a problem with the generated .mrd and Python's MRD reader. As when I tried it it also did not work. This helps us fix it.

The problem with the userParameterLong XML header having not-numeric stuff, we can fix easily.

Could you send us the code to open an .mrd in Python? I assume you opened the UI, pressed simulate, and tried opening the resulting .mrd located in the temp folder. Just to try to reproduce this.

@gabuzi
Copy link
Contributor Author

gabuzi commented Nov 15, 2023

Hi @cncastillo
For reproducing:
Yes, I ran 'simulate' with the default options in the GUI and then grabbed the tmp file.
On the Python side: installed ismrmrd 1.14.0 via pip on python 3.10
then, try the following snippet:

import ismrmrd

with ismrmrd.File("Koma_signal.mrd") as f:
    # hdr = f['dataset'].header  # <- this fails because of the xml error
    acquisitions = f['dataset'].acquisitions[:]

acq_headers = [a.getHead() for a in acquisitions]
# now you can look at the individual acq headers
# the encodingCounters of the n-th readout can e.g. be inspected via print(acq_headers[n].idx)

@johannesmayer
Copy link

johannesmayer commented Nov 15, 2023

I had encounered something similar before, when trying to open it using pyismrmrd. When I tried reading it it could not reshape the data as it did not find any.
I went and opened it in hdfview, then the xml header is correct, each of the acquisition headers is also correct (index, timestamp etc.), but only the trajectory and the data have errors:
This is if you open the file
hdfview ./code/KomaMRI.jl/KomaMRIPlots/test/test_files/Koma_signal.mrd
image

@gabuzi
Copy link
Contributor Author

gabuzi commented Nov 15, 2023

@johannesmayer
I just ran some tests on my system. I do see the problematic xml also with hdfview (userParameters element is nested twice).

Regarding the binary acquisition headers, I can confirm that they do look ok with hdfview for data that is problematic to read via Python!
At the same time, our data that we can open fine with python also looks good on hdfview, so python is probably more picky in some regards.

@cncastillo
Copy link
Member

cncastillo commented Nov 15, 2023

Hi could any of you attach a valid MRD to compare?

Edit: I generated one in MATLAB with https://github.com/ismrmrd/ismrmrd/blob/master/examples/matlab/create_dataset.m

@gabuzi
Copy link
Contributor Author

gabuzi commented Nov 15, 2023

I found where the problem is. The ismrmrd spec is quite strict regarding the binary layout of the data. See https://ismrmrd.readthedocs.io/en/latest/mrd_raw_data.html#acquisitionheader. It defines the size of the fields and, crucially, their offset.
The ismrmrd python code makes use of this binary layout and copies the entire header as binary (e.g. https://github.com/ismrmrd/ismrmrd-python/blob/41deff1ef14e35f77047bd9c2519174c5000a0c5/ismrmrd/acquisition.py#L164)

However, hdf5 doesn't necessarily require strict binary layouts if the data is described well, and it seems like the files generated by KomaMRI do insert some padding in the binary output, and a header struct as it appears in hdf as seen from python ends up being 352 bytes instead of the required 340.

The following code illustrates this (the second file is one from mridata.org that loads fine in python):

import ismrmrd

def analyze_dtypes(ismrmrd_ds):
    data_dtype = ds._file['dataset']['data'].dtype
    print("data dtype (includes head)")
    print_dtype(data_dtype)
    head_dtype = ds._file['dataset']['data'][0]['head'].dtype
    print("head dtype")
    print_dtype(head_dtype)

def print_dtype(dtype):
    print(dtype.descr)
    print(dtype.str)

ds = ismrmrd.Dataset("Koma_0.7.5_default_signal.mrd", create_if_needed=False)
print("KomaMRI data")
analyze_dtypes(ds)
ds.close()

ds = ismrmrd.Dataset("52c2fd53-d233-4444-8bfd-7c454240d314.h5", create_if_needed=False)
print("mridata.org data")
analyze_dtypes(ds)
ds.close()

which for me gives

KomaMRI data
data dtype (includes head)
[('head', [('version', '<u2'), ('', '|V6'), ('flags', '<u8'), ('measurement_uid', '<u4'), ('scan_counter', '<u4'), ('acquisition_time_stamp', '<u4'), ('physiology_time_stamp', '<u4', (3,)), ('number_of_samples', '<u2'), ('available_channels', '<u2'), ('active_channels', '<u2'), ('', '|V2'), ('channel_mask', '<u8', (16,)), ('discard_pre', '<u2'), ('discard_post', '<u2'), ('center_sample', '<u2'), ('encoding_space_ref', '<u2'), ('trajectory_dimensions', '<u2'), ('', '|V2'), ('sample_time_us', '<f4'), ('position', '<f4', (3,)), ('read_dir', '<f4', (3,)), ('phase_dir', '<f4', (3,)), ('slice_dir', '<f4', (3,)), ('patient_table_position', '<f4', (3,)), ('idx', [('kspace_encode_step_1', '<u2'), ('kspace_encode_step_2', '<u2'), ('average', '<u2'), ('slice', '<u2'), ('contrast', '<u2'), ('phase', '<u2'), ('repetition', '<u2'), ('set', '<u2'), ('segment', '<u2'), ('user', '<u2', (8,))]), ('', '|V2'), ('user_int', '<i4', (8,)), ('user_float', '<f4', (8,))]), ('traj', ('|O', {'vlen': dtype('<f4')})), ('', '|V8'), ('data', ('|O', {'vlen': dtype('<f4')})), ('', '|V8')]
|V384
head dtype
[('version', '<u2'), ('', '|V6'), ('flags', '<u8'), ('measurement_uid', '<u4'), ('scan_counter', '<u4'), ('acquisition_time_stamp', '<u4'), ('physiology_time_stamp', '<u4', (3,)), ('number_of_samples', '<u2'), ('available_channels', '<u2'), ('active_channels', '<u2'), ('', '|V2'), ('channel_mask', '<u8', (16,)), ('discard_pre', '<u2'), ('discard_post', '<u2'), ('center_sample', '<u2'), ('encoding_space_ref', '<u2'), ('trajectory_dimensions', '<u2'), ('', '|V2'), ('sample_time_us', '<f4'), ('position', '<f4', (3,)), ('read_dir', '<f4', (3,)), ('phase_dir', '<f4', (3,)), ('slice_dir', '<f4', (3,)), ('patient_table_position', '<f4', (3,)), ('idx', [('kspace_encode_step_1', '<u2'), ('kspace_encode_step_2', '<u2'), ('average', '<u2'), ('slice', '<u2'), ('contrast', '<u2'), ('phase', '<u2'), ('repetition', '<u2'), ('set', '<u2'), ('segment', '<u2'), ('user', '<u2', (8,))]), ('', '|V2'), ('user_int', '<i4', (8,)), ('user_float', '<f4', (8,))]
|V352
mridata.org data
data dtype (includes head)
[('head', [('version', '<u2'), ('flags', '<u8'), ('measurement_uid', '<u4'), ('scan_counter', '<u4'), ('acquisition_time_stamp', '<u4'), ('physiology_time_stamp', '<u4', (3,)), ('number_of_samples', '<u2'), ('available_channels', '<u2'), ('active_channels', '<u2'), ('channel_mask', '<u8', (16,)), ('discard_pre', '<u2'), ('discard_post', '<u2'), ('center_sample', '<u2'), ('encoding_space_ref', '<u2'), ('trajectory_dimensions', '<u2'), ('sample_time_us', '<f4'), ('position', '<f4', (3,)), ('read_dir', '<f4', (3,)), ('phase_dir', '<f4', (3,)), ('slice_dir', '<f4', (3,)), ('patient_table_position', '<f4', (3,)), ('idx', [('kspace_encode_step_1', '<u2'), ('kspace_encode_step_2', '<u2'), ('average', '<u2'), ('slice', '<u2'), ('contrast', '<u2'), ('phase', '<u2'), ('repetition', '<u2'), ('set', '<u2'), ('segment', '<u2'), ('user', '<u2', (8,))]), ('user_int', '<i4', (8,)), ('user_float', '<f4', (8,))]), ('', '|V4'), ('traj', ('|O', {'vlen': dtype('<f4')})), ('', '|V8'), ('data', ('|O', {'vlen': dtype('<f4')})), ('', '|V8')]
|V376
head dtype
[('version', '<u2'), ('flags', '<u8'), ('measurement_uid', '<u4'), ('scan_counter', '<u4'), ('acquisition_time_stamp', '<u4'), ('physiology_time_stamp', '<u4', (3,)), ('number_of_samples', '<u2'), ('available_channels', '<u2'), ('active_channels', '<u2'), ('channel_mask', '<u8', (16,)), ('discard_pre', '<u2'), ('discard_post', '<u2'), ('center_sample', '<u2'), ('encoding_space_ref', '<u2'), ('trajectory_dimensions', '<u2'), ('sample_time_us', '<f4'), ('position', '<f4', (3,)), ('read_dir', '<f4', (3,)), ('phase_dir', '<f4', (3,)), ('slice_dir', '<f4', (3,)), ('patient_table_position', '<f4', (3,)), ('idx', [('kspace_encode_step_1', '<u2'), ('kspace_encode_step_2', '<u2'), ('average', '<u2'), ('slice', '<u2'), ('contrast', '<u2'), ('phase', '<u2'), ('repetition', '<u2'), ('set', '<u2'), ('segment', '<u2'), ('user', '<u2', (8,))]), ('user_int', '<i4', (8,)), ('user_float', '<f4', (8,))]
|V340

We can find once ('', '|V6') and three times ('', '|V2') in the head dtype. This makes up for the extra 12 bytes.
The entire "data" dtype is only 8 bytes bigger for the KomaMRI data (384 vs 376), so there is some change in padding as well, and I'm not sure if it matters.

@cncastillo
Copy link
Member

cncastillo commented Nov 15, 2023

It could be a bug in MRIFiles. I took a look to the function get_hdf5type_acquisitionheader in MRIFiles/src/ISMRMRD/HDF5Types.jl, and I see the same byte difference shown by @gabuzi. I think the culprit is the function fieldoffset that is incorrectly calculating the byte offsets for the fields of MRIFiles.AcquisitionHeaderImmutable (probably also incorrect for other structs).

Code:

using MRIFiles
structinfo(T) = [((i!=1 ? sum(sizeof(fieldtype(T,j)) for j=1:i-1) : 0), sizeof(fieldtype(T,i)), fieldname(T,i), fieldtype(T,i)) for i = 1:fieldcount(T)]
myfieldoffset(T, i) = i!=1 ? sum(sizeof(fieldtype(T,j)) for j=1:i-1) : 0
a = fieldoffset.(Ref(MRIFiles.AcquisitionHeaderImmutable),1:fieldcount(MRIFiles.AcquisitionHeaderImmutable)) .|> Int
b = myfieldoffset.(Ref(MRIFiles.AcquisitionHeaderImmutable),1:fieldcount(MRIFiles.AcquisitionHeaderImmutable)) .|> Int
[b a [0; diff(a.-b; dims=1)]]

Output:

julia> [b a [0; diff(a.-b; dims=1)]]
24×3 Matrix{Int64}:
   0    0  0
   2    8  6       # <- |V6
  10   16  0
  14   20  0
  18   24  0
  22   28  0
  34   40  0
  36   42  0
  38   44  0
  40   48  2       # <- |V2
 168  176  0
 170  178  0
 172  180  0
 174  182  0
 176  184  0
 178  188  2       # <- |V2
 182  192  0
 194  204  0
 206  216  0
 218  228  0
 230  240  0
 242  252  0
 276  288  2       # <- |V2
 308  320  0

The byte offsets i calculate correspond to the ones described in the official MRD docs (AcquisitionHeader)

# byte offset, size in bytes, field, fieldtype
julia> structinfo(MRIFiles.AcquisitionHeaderImmutable)
24-element Vector{Tuple{Int64, Int64, Symbol, DataType}}:
 (0, 2, :version, UInt16)
 (2, 8, :flags, UInt64)
 (10, 4, :measurement_uid, UInt32)
 (14, 4, :scan_counter, UInt32)
 (18, 4, :acquisition_time_stamp, UInt32)
 (22, 12, :physiology_time_stamp, Tuple{UInt32, UInt32, UInt32})
 (34, 2, :number_of_samples, UInt16)
 (36, 2, :available_channels, UInt16)
 (38, 2, :active_channels, UInt16)
 (40, 128, :channel_mask, NTuple{16, UInt64})
 (168, 2, :discard_pre, UInt16)
 (170, 2, :discard_post, UInt16)
 (172, 2, :center_sample, UInt16)
 (174, 2, :encoding_space_ref, UInt16)
 (176, 2, :trajectory_dimensions, UInt16)
 (178, 4, :sample_time_us, Float32)
 (182, 12, :position, Tuple{Float32, Float32, Float32})
 (194, 12, :read_dir, Tuple{Float32, Float32, Float32})
 (206, 12, :phase_dir, Tuple{Float32, Float32, Float32})
 (218, 12, :slice_dir, Tuple{Float32, Float32, Float32})
 (230, 12, :patient_table_position, Tuple{Float32, Float32, Float32})
 (242, 34, :idx, EncodingCountersImmutable)
 (276, 32, :user_int, NTuple{8, Int32})
 (308, 32, :user_float, NTuple{8, Float32})

I will keep looking at it tomorrow :)

@tknopp
Copy link

tknopp commented Dec 2, 2023

Oh, this is really bad. I have not seen https://ismrmrd.readthedocs.io/en/latest/mrd_raw_data.html#acquisitionheader and assumed that the offsets correspond to the in-memory layout of the C struct (which in Julia is the immutable header type). And when I worked on this, I actually have seen the padded data in some public MRM files.

So am I reading https://ismrmrd.readthedocs.io/en/latest/mrd_raw_data.html#acquisitionheader correctly, that the layout is always tight, i.e. that do padding is involved? In that case myfieldoffset seems to be the correct implementation and we then would not need to hardcode the actual table.

One issue though is that I don't understand how we can present the "unpadded" struct in Julia. Does somebody know how this is handled in the C bindings? The C struct will also have such padding.

@tknopp
Copy link

tknopp commented Dec 2, 2023

Looking at https://github.com/ismrmrd/ismrmrd/blob/master/libsrc/serialization.cpp#L11 it seems that the C struct is also serialized directly. I don't get that. Does it mean that the C struct is not padded?

To solve this issue it would be great to ping somebody, how has more experience with the C/C++ implementation and can clarify these questions.

@gabuzi
Copy link
Contributor Author

gabuzi commented Dec 2, 2023

@tknopp Thanks for tuning in!

Looking at https://github.com/ismrmrd/ismrmrd/blob/d364e03d3faa3ca516da7807713b5acc72218a37/include/ismrmrd/ismrmrd.h#L259, especially including up to L315, I'm quite confident that the C/C++ datastructures are not padded. At least not within the AcquisitionHeader struct. They explicitly check at compile time (static_assert) that their struct has the offsets specified in https://ismrmrd.readthedocs.io/en/latest/mrd_raw_data.html#acquisitionheader. They can thus just write their struct to the stream as is (the line referenced by you), and it should end up there with the same offsets.

@tknopp
Copy link

tknopp commented Dec 3, 2023

C structs are padded by default unless you tell the compiler not to do so. But have found the line where this is actually done:
https://github.com/ismrmrd/ismrmrd/blob/d364e03d3faa3ca516da7807713b5acc72218a37/include/ismrmrd/ismrmrd.h#L82C1-L82C13
So with that all is clear.

I already did a little bit of research how we can solve this. Unfortunately we can't disable structural padding. But I think we will simply convert the struct to an UInt8 Array and that should do the trick.

@tknopp
Copy link

tknopp commented Dec 3, 2023

And for reference: In our unit test we use the ISMRMRD data:
https://sourceforge.net/projects/ismrmrd/files/data/

And that seems to be padded. Would be great if someone could have a look and inspect that data. I can't load it with the python ismrmrd library but don't know where the issue is.

So what I would definitely also need to fix the issue is correct data. Ideally again, some cartesian and spiral data. What is also important is that the data is small, so that we can use it in the unit tests.

Edit: I was wrong. The data is not padded but I misinterpreted the way Julia reads it. Everything is fine.

@tknopp
Copy link

tknopp commented Dec 3, 2023

Ok, here is the fix: MagneticResonanceImaging/MRIReco.jl#167

Would be great if someone could test it.

@cncastillo
Copy link
Member

Hi @tknopp, Thanks for looking into this.

I have tested the fix with cartesian and spiral data generated with Koma, and I think it does in fact solve the problem, as I do not see the data being corrupted in Python anymore. I will do further testing, but I believe I will close this issue soon :).

@tknopp
Copy link

tknopp commented Dec 4, 2023

Great, thanks so much for reporting. I did, by the way, already do the releases (including the move of the Limit... types) so you can already rely on that in Koma.

@cncastillo
Copy link
Member

Thanks! we will start using the Limit type in MRIBase now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants