Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round data in staticmaps.nc #231

Open
hboisgon opened this issue Jan 22, 2024 · 3 comments
Open

Round data in staticmaps.nc #231

hboisgon opened this issue Jan 22, 2024 · 3 comments
Labels
enhancement New feature or request needs refinement issue still needs refinement

Comments

@hboisgon
Copy link
Contributor

Kind of request

Changing existing functionality

Enhancement Description

We have the request for staticgeoms but I think it would be good practice to round all grids in staticmaps.nc.
The number of decimals does not make sense. Eg

image

Use case

It would produce less big file for staticmaps.nc. Not sure if it would have any impact on computation speed?

Additional Context

No response

@hboisgon hboisgon added enhancement New feature or request needs refinement issue still needs refinement labels Jan 22, 2024
@Huite
Copy link

Huite commented Jan 22, 2024

Hi @hboisgon,

I saw an issue like this come up in my mentions earlier, for Wflow.jl: Deltares/Wflow.jl#314

Like I mention there: you generally don't want to round binary numbers. A float32 will always take 32 bits of memory, and a float64 will take 64 bits of memory. You might get smaller files if you turn compression on, and rounding might help a little since you are reducing the information content (so the compression algorithm will be able to find more redundancy), but you need to turn on compression in either case.

But if you're looking to reduce file sizes, I recommend investigating compression instead. NetCDF4 only supports zlib compression; e.g. Zarr uses Blosc for far more performant compression.

With regards to the physical interpretation: if you want to add that, you should probably try adding metadata instead. You could argue that a river width is never more accurate than 1 cm (for example), but doesn't generalize: e.g. if you're doing computational/numerical experiments.

And in that case you should do error propagation proper! That's stuff like this:
https://github.com/JuliaPhysics/Measurements.jl
https://pythonhosted.org/uncertainties/

And then ideally support it in an xarray package like pint does: https://xarray.dev/blog/introducing-pint-xarray

@shartgring
Copy link
Collaborator

This may also relate to https://docs.xarray.dev/en/latest/user-guide/io.html#writing-encoded-data. I read online (pydata/xarray#865 and pydata/xarray#1572) that lossy compression is possible and may go hand in hand with with rounding the data, as accuracy is guaranteed for a certain number of digits, I guess similar to this: https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_quantize.html

I am not sure how this would work with zlib, if it is either lossy vs lossless, or that a combination can be used?

@Huite
Copy link

Huite commented May 30, 2024

It looks a bit like a breadcrumbs trail to be honest, as xarray doesn't just provide an overview -- which is reasonable, since it depends on what's available in the netCDF4 / HDF5 binaries.

The relevant netCDF4-python docs: https://unidata.github.io/netcdf4-python/#efficient-compression-of-netcdf-variables:

zlib compression is always available, szip is available if the linked HDF5 library supports it, and zstd, bzip2, blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are available via optional external plugins.

For hydromt, you can safely assume that the binary origin is conda-forge so whatever plugins are compiled there are relevant.

More info is probably only available on the netCDF docs directly, among them quantizing: https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_quantize.html

Pretty confident that zlib is lossless.

Best approach IMO is to setup a new pixi env, see which schemes work, and make some examples. Would be useful documentation anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs refinement issue still needs refinement
Projects
None yet
Development

No branches or pull requests

3 participants