Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process/Decode Chunk Issue #419

Open
dwest77a opened this issue Feb 14, 2024 · 5 comments
Open

Process/Decode Chunk Issue #419

dwest77a opened this issue Feb 14, 2024 · 5 comments

Comments

@dwest77a
Copy link

dwest77a commented Feb 14, 2024

I have some NetCDF UKCP data with a variable called "yyyymmdd" that is stored in the Kerchunk file like so:

"yyyymmdd/.zarray": "{\"chunks\":[1,64],\"compressor\":null,\"dtype\":\"|S1\",\"fill_value\":\"IA==\",\"filters\":null,\"order\":\"C\",\"shape\":[3600,64],\"zarr_format\":2}",
        "yyyymmdd/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"time\",\"string64\"],\"long_name\":\"yyyymmdd\",\"units\":\"1\"}",
        "yyyymmdd/0.0": "19801201",
        "yyyymmdd/1.0": "19801202",
        "yyyymmdd/2.0": "19801203",
        "yyyymmdd/3.0": "19801204",
        "yyyymmdd/4.0": "19801205",
        "yyyymmdd/5.0": "19801206",
        "yyyymmdd/6.0": "19801207",
        "yyyymmdd/7.0": "19801208",
        "yyyymmdd/8.0": "19801209",
        "yyyymmdd/9.0": "19801210",

When decoded I get the error message:
cannot reshape array of size 8 into shape (1,64)

Which I think is because the part of Zarr that decodes this is expecting a base64 encoded array rather than a string of 8 characters? That or the dimension/chunk/shape is being interpreted incorrectly. How should an array like this be interpreted within Zarr and decoded into an array of 1 by 64 when each chunk is an 8 character string?

@martindurant
Copy link
Member

The dtype says that this is a 1-char per element field, and there are 8 characters in each entry. The chunk shape is 1,64 - so zarr is right to error. Actually, the values look like filenames, no?

base64 would only require more characters for the same output, so I don't think that's it (although the fill value is suggestive).

@dwest77a
Copy link
Author

I think this set of NetCDFs just has an extra string field for the date (which is unnecessary but still something that the data provider included). The actual files look like huss_rcp85_land-rcm_uk_12km_01_day_19801201-19901130.nc

There are 8 characters in each entry which are each considered their own chunk. Each chunk is decoded in zarr Array._process_chunk which may be unnecessary since these chunks are not base64 encoded, should this step be skipped for this dtype?

@dwest77a
Copy link
Author

@martindurant
Copy link
Member

I got the file, but I won't have time to look until at least tomorrow.

@dwest77a
Copy link
Author

No problem, this isn't particularly time-sensitive for me at the moment. Thanks for taking the time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants