You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically I want to check some assumptions I'm making about concatenation of kerchunk references in VirtualiZarr (for this issue).
I would like to know if when I scan e.g. a single netCDF file with kerchunk, for the returned reference dict:
a) are there files where I will get back more than one chunk entry per variable for one file,
b) if so are there files where instead of all those chunks being the same size, one of them could be smaller (like the final chunk can be in the zarr model)?
In the zarr model (b) is allowed, but that doesn't mean that kerchunk ever actually does it.
If (b) never happens, that's nice because then I can basically just take the shape key in each kerchunk reference dict as referring to every chunk from that variable from that file with no exceptions, and I won't have issues of unknowingly concatenating arrays to become variable-length chunks.
The text was updated successfully, but these errors were encountered:
a) are there files where I will get back more than one chunk entry per variable for one file
Yes, HDF5 variables are often chunked similar to zarr variables but with often smaller chunks. In netCDF3 (I don't think you meant this), there is only limited chunking along the append axis.
b) if so are there files where instead of all those chunks being the same size, one of them could be smaller (like the final chunk can be in the zarr model)?
Yes, you can have a shape that is not an exact multiple of the chunk size, so the last chunk is incomplete. I don't remember right now if hdf stores a full chunk (like zarr does), I suppose yes. It should be easy to test!
Not being able to virtually concatenate variables from files because of the last-chunk issue is one of the drivers for ZEP003.
Basically I want to check some assumptions I'm making about concatenation of kerchunk references in VirtualiZarr (for this issue).
I would like to know if when I scan e.g. a single netCDF file with kerchunk, for the returned reference dict:
a) are there files where I will get back more than one chunk entry per variable for one file,
b) if so are there files where instead of all those chunks being the same size, one of them could be smaller (like the final chunk can be in the zarr model)?
In the zarr model (b) is allowed, but that doesn't mean that kerchunk ever actually does it.
If (b) never happens, that's nice because then I can basically just take the
shape
key in each kerchunk reference dict as referring to every chunk from that variable from that file with no exceptions, and I won't have issues of unknowingly concatenating arrays to become variable-length chunks.The text was updated successfully, but these errors were encountered: