Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Zarr 2.18.0 with Blosc #192

Closed
3 tasks done
CodyCBakerPhD opened this issue May 8, 2024 · 5 comments
Closed
3 tasks done

[Bug]: Zarr 2.18.0 with Blosc #192

CodyCBakerPhD opened this issue May 8, 2024 · 5 comments
Assignees
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users

Comments

@CodyCBakerPhD
Copy link
Contributor

What happened?

Just encountered test failure in NeuroConv due to latest Zarr release on May 7

Full log: https://github.com/catalystneuro/neuroconv/actions/runs/9005172770/job/24739878400

Including test case below, show have most of what you need to reproduce

Wanted to check if this has anything to do with how the file is being read on hdmf-zarr side, or otherwise just let y'all be aware of the issue

Steps to Reproduce

tmpdir = local('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_simple_time_series_zarr_g0')
integer_array = array([[   606,  22977,  27598, ...,  21453,  14831,  29962],
       [-26530,  -9155,  -6666, ...,  18490,  -6943,   1...5954, -21319, ...,  -8983, -30074, -24446],
       [-30841, -12815,  28599, ...,  24069, -15762,  -3284]], dtype=int16)
case_name = 'generic'
iterator = <class 'neuroconv.tools.hdmf.SliceableDataChunkIterator'>
iterator_options = {}, backend = 'zarr'

    @pytest.mark.parametrize(
        "case_name,iterator,iterator_options",
        [
            ("unwrapped", lambda x: x, dict()),
            ("generic", SliceableDataChunkIterator, dict()),
            ("classic", DataChunkIterator, dict(iter_axis=1, buffer_size=30_000 * 5)),
            # Need to hardcode buffer size in classic case or else it takes forever...
        ],
    )
    @pytest.mark.parametrize("backend", ["hdf5", "zarr"])
    def test_simple_time_series(
        tmpdir: Path,
        integer_array: np.ndarray,
        case_name: str,
        iterator: callable,
        iterator_options: dict,
        backend: Literal["hdf5", "zarr"],
    ):
        data = iterator(integer_array, **iterator_options)
    
        nwbfile = mock_NWBFile()
        time_series = mock_TimeSeries(name="TestTimeSeries", data=data)
        nwbfile.add_acquisition(time_series)
    
        backend_configuration = get_default_backend_configuration(nwbfile=nwbfile, backend=backend)
        dataset_configuration = backend_configuration.dataset_configurations["acquisition/TestTimeSeries/data"]
        configure_backend(nwbfile=nwbfile, backend_configuration=backend_configuration)
    
        nwbfile_path = str(tmpdir / f"test_configure_defaults_{case_name}_time_series.nwb.{backend}")
        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="w") as io:
            io.write(nwbfile)
    
        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
            written_nwbfile = io.read()

Traceback

with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
    > written_nwbfile = io.read()

tests/test_minimal/test_tools/test_backend_and_dataset_configuration/test_helpers/test_configure_backend_defaults.py:74: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/backends/io.py:56: in read
    f_builder = self.read_builder()
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1323: in read_builder
    f_builder = self.__read_group(self.__file, ROOT_NAME)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1393: in __read_group
    sub_builder = self.__read_dataset(sub_array, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1454: in __read_dataset
    data = zarr_obj[0]
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:800: in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:926: in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:968: in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:1343: in _get_selection
    self._chunk_getitems(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2181: in _chunk_getitems
    self._process_chunk(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2049: in _process_chunk
    self._compressor.decode(cdata, dest)
numcodecs/blosc.pyx:564: in numcodecs.blosc.Blosc.decode
    ???
numcodecs/blosc.pyx:365: in numcodecs.blosc.decompress
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: buffer source array is read-only

Operating System

Windows

Python Executable

Conda

Python Version

3.8

Package Versions

No response

Code of Conduct

@oruebel
Copy link
Contributor

oruebel commented May 8, 2024

From the traceback it looks like this fails when it tries to read the first element of the data set data = zarr_obj[0] and it looks like the error occurs in numcodec rather than Zarr. What confuses me is the error ValueError: buffer source array is read-only, which seems to indicate that blosc wants write access even when reading from file. I'm wondering whether this may be an issue in Zarr or numcodec instead of hdmf_zarr

A couple of things to try:

  • Change the mode during reading to a instead of r to see if that fixes the issue, i.e., change this line to with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="a") as io:. If that works, then I think this may be an issue in Zarr.
  • Can you also try reading the file with just Zarr, i.e,. open the file and try to read data = zarr_obj[0] from the dataset that causes the issue? If that works, another thing to try is to open with consolidated metadata (because hdmf_zarr uses consolidated metadata by default)

@oruebel oruebel added category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels May 8, 2024
@oruebel
Copy link
Contributor

oruebel commented May 8, 2024

@mavaylon1 can you take this from here.

@mavaylon1
Copy link
Contributor

@oruebel I can, but I probably won't take a look till earliest end of next week. Does that fit your timeline?

@mavaylon1
Copy link
Contributor

Possibly related: #195

@mavaylon1
Copy link
Contributor

This seems to have been resolved by my fix and release in #195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Projects
None yet
Development

No branches or pull requests

3 participants