Using Kerchunk to combine datasets into datatrees #357
-
I've been exploring using datatree to build collections of forecast datasets. I've been able to take a datatree NetCDF and use SingleHdf5ToZarr, so Kerchunk has some ability to work with datatrees. What I'd like to figure out before settling too much on the api that I'm working on, would it be possible to perform a MultiZarrToZarr-like operation that instead of aggregating already individually kerchunked datasets into a single dataset, instead puts datasets at a specific position in the datatree? I imagine being able to do something like this and then be able load the resulting json as a datatree and then use the library I'm playing with to slice and dice the underlying datasets. mzz = MultiZarrToZarrTree(
{ # path in tree to place: dataset
"forecast_reference_time/2023-09-04T00-00-00": "./forecast_2023-09-04.json",
"forecast_reference_time/2023-09-05T00-00-00": "./forecast_2023-09-05.json",
"forecast_reference_time/2023-09-06T00-00-00": "./forecast_2023-09-06.json",
},
# various kwargs in common with MultiZarrToZarr like remote_protocol and remote_options
)
d = mzz.translate() |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
This is definitely something that kerchunk can do, but not with MultiZarrToZarr. Since the structure of keys in a single zarr follows the filesystem hierarchy, it is enough to rename the keys to the desired new hierarchy. For example, something like the following would do:
This makes a dictionary of everything, which you can then save; else you could choose to fill in a Lazy/parquet mapper, or a number of more complicated scenarios. The function The reason MultiZarrToZarr doesn't do this, is because it looks at coordinates in each dataset and tries to stack the data chunks along the various dimensions - which only really makes sense in the netCDF model of one layer of arrays. We could write functions like the one above to do more general combine of multiple datasets in various schemes. |
Beta Was this translation helpful? Give feedback.
-
I did something like that here: https://ncar.github.io/esds/posts/2023/kerchunk-mom6/ I think it would be nice to add a |
Beta Was this translation helpful? Give feedback.
This is definitely something that kerchunk can do, but not with MultiZarrToZarr. Since the structure of keys in a single zarr follows the filesystem hierarchy, it is enough to rename the keys to the desired new hierarchy. For example, something like the following would do:
This makes a dictionary of everything, which you can then save; else you could choose to fill in a Lazy/parquet mapper, or a number of more complicated scenarios. The …