[Bug]: Zarr 2.18.0 with Blosc #192

CodyCBakerPhD · 2024-05-08T17:15:33Z

What happened?

Just encountered test failure in NeuroConv due to latest Zarr release on May 7

Full log: https://github.com/catalystneuro/neuroconv/actions/runs/9005172770/job/24739878400

Including test case below, show have most of what you need to reproduce

Wanted to check if this has anything to do with how the file is being read on hdmf-zarr side, or otherwise just let y'all be aware of the issue

Steps to Reproduce

tmpdir = local('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_simple_time_series_zarr_g0')
integer_array = array([[   606,  22977,  27598, ...,  21453,  14831,  29962],
       [-26530,  -9155,  -6666, ...,  18490,  -6943,   1...5954, -21319, ...,  -8983, -30074, -24446],
       [-30841, -12815,  28599, ...,  24069, -15762,  -3284]], dtype=int16)
case_name = 'generic'
iterator = <class 'neuroconv.tools.hdmf.SliceableDataChunkIterator'>
iterator_options = {}, backend = 'zarr'

    @pytest.mark.parametrize(
        "case_name,iterator,iterator_options",
        [
            ("unwrapped", lambda x: x, dict()),
            ("generic", SliceableDataChunkIterator, dict()),
            ("classic", DataChunkIterator, dict(iter_axis=1, buffer_size=30_000 * 5)),
            # Need to hardcode buffer size in classic case or else it takes forever...
        ],
    )
    @pytest.mark.parametrize("backend", ["hdf5", "zarr"])
    def test_simple_time_series(
        tmpdir: Path,
        integer_array: np.ndarray,
        case_name: str,
        iterator: callable,
        iterator_options: dict,
        backend: Literal["hdf5", "zarr"],
    ):
        data = iterator(integer_array, **iterator_options)
    
        nwbfile = mock_NWBFile()
        time_series = mock_TimeSeries(name="TestTimeSeries", data=data)
        nwbfile.add_acquisition(time_series)
    
        backend_configuration = get_default_backend_configuration(nwbfile=nwbfile, backend=backend)
        dataset_configuration = backend_configuration.dataset_configurations["acquisition/TestTimeSeries/data"]
        configure_backend(nwbfile=nwbfile, backend_configuration=backend_configuration)
    
        nwbfile_path = str(tmpdir / f"test_configure_defaults_{case_name}_time_series.nwb.{backend}")
        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="w") as io:
            io.write(nwbfile)
    
        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
            written_nwbfile = io.read()

Traceback

with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
    > written_nwbfile = io.read()

tests/test_minimal/test_tools/test_backend_and_dataset_configuration/test_helpers/test_configure_backend_defaults.py:74: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/backends/io.py:56: in read
    f_builder = self.read_builder()
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1323: in read_builder
    f_builder = self.__read_group(self.__file, ROOT_NAME)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1393: in __read_group
    sub_builder = self.__read_dataset(sub_array, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1454: in __read_dataset
    data = zarr_obj[0]
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:800: in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:926: in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:968: in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:1343: in _get_selection
    self._chunk_getitems(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2181: in _chunk_getitems
    self._process_chunk(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2049: in _process_chunk
    self._compressor.decode(cdata, dest)
numcodecs/blosc.pyx:564: in numcodecs.blosc.Blosc.decode
    ???
numcodecs/blosc.pyx:365: in numcodecs.blosc.decompress
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: buffer source array is read-only

Operating System

Windows

Python Executable

Conda

Python Version

3.8

Package Versions

No response

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this bug was not already reported?

oruebel · 2024-05-08T17:56:11Z

From the traceback it looks like this fails when it tries to read the first element of the data set data = zarr_obj[0] and it looks like the error occurs in numcodec rather than Zarr. What confuses me is the error ValueError: buffer source array is read-only, which seems to indicate that blosc wants write access even when reading from file. I'm wondering whether this may be an issue in Zarr or numcodec instead of hdmf_zarr

A couple of things to try:

Change the mode during reading to a instead of r to see if that fixes the issue, i.e., change this line to with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="a") as io:. If that works, then I think this may be an issue in Zarr.
Can you also try reading the file with just Zarr, i.e,. open the file and try to read data = zarr_obj[0] from the dataset that causes the issue? If that works, another thing to try is to open with consolidated metadata (because hdmf_zarr uses consolidated metadata by default)

oruebel · 2024-05-08T17:57:19Z

@mavaylon1 can you take this from here.

mavaylon1 · 2024-05-08T23:36:03Z

@oruebel I can, but I probably won't take a look till earliest end of next week. Does that fit your timeline?

mavaylon1 · 2024-05-16T23:08:52Z

Possibly related: #195

mavaylon1 · 2024-06-06T13:56:16Z

This seems to have been resolved by my fix and release in #195

This was referenced May 8, 2024

Pin Zarr for tests catalystneuro/neuroconv#845

Merged

Pin Zarr for tests NeurodataWithoutBorders/nwbinspector#460

Merged

oruebel added category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels May 8, 2024

mavaylon1 self-assigned this May 13, 2024

stephprince mentioned this issue May 14, 2024

Avoid having to set rate=None explitctly when passing timestamps in mock_ElectricalSeries NeurodataWithoutBorders/pynwb#1894

Merged

6 tasks

alejoe91 mentioned this issue May 14, 2024

Add safeguard for probeinterface use of read_spikegadgets SpikeInterface/spikeinterface#2833

Merged

mavaylon1 mentioned this issue May 16, 2024

Update for zarr 2.18.0 and zarr 2.18.1 #195

Merged

6 tasks

mavaylon1 closed this as completed Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Zarr 2.18.0 with Blosc #192

[Bug]: Zarr 2.18.0 with Blosc #192

CodyCBakerPhD commented May 8, 2024

oruebel commented May 8, 2024

oruebel commented May 8, 2024

mavaylon1 commented May 8, 2024

mavaylon1 commented May 16, 2024

mavaylon1 commented Jun 6, 2024

[Bug]: Zarr 2.18.0 with Blosc #192

[Bug]: Zarr 2.18.0 with Blosc #192

Comments

CodyCBakerPhD commented May 8, 2024

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

oruebel commented May 8, 2024

oruebel commented May 8, 2024

mavaylon1 commented May 8, 2024

mavaylon1 commented May 16, 2024

mavaylon1 commented Jun 6, 2024