Round data in staticmaps.nc #231

hboisgon · 2024-01-22T07:27:46Z

Kind of request

Changing existing functionality

Enhancement Description

We have the request for staticgeoms but I think it would be good practice to round all grids in staticmaps.nc.
The number of decimals does not make sense. Eg

Use case

It would produce less big file for staticmaps.nc. Not sure if it would have any impact on computation speed?

Additional Context

No response

Huite · 2024-01-22T11:07:12Z

Hi @hboisgon,

I saw an issue like this come up in my mentions earlier, for Wflow.jl: Deltares/Wflow.jl#314

Like I mention there: you generally don't want to round binary numbers. A float32 will always take 32 bits of memory, and a float64 will take 64 bits of memory. You might get smaller files if you turn compression on, and rounding might help a little since you are reducing the information content (so the compression algorithm will be able to find more redundancy), but you need to turn on compression in either case.

But if you're looking to reduce file sizes, I recommend investigating compression instead. NetCDF4 only supports zlib compression; e.g. Zarr uses Blosc for far more performant compression.

With regards to the physical interpretation: if you want to add that, you should probably try adding metadata instead. You could argue that a river width is never more accurate than 1 cm (for example), but doesn't generalize: e.g. if you're doing computational/numerical experiments.

And in that case you should do error propagation proper! That's stuff like this:
https://github.com/JuliaPhysics/Measurements.jl
https://pythonhosted.org/uncertainties/

And then ideally support it in an xarray package like pint does: https://xarray.dev/blog/introducing-pint-xarray

shartgring · 2024-05-30T08:16:01Z

This may also relate to https://docs.xarray.dev/en/latest/user-guide/io.html#writing-encoded-data. I read online (pydata/xarray#865 and pydata/xarray#1572) that lossy compression is possible and may go hand in hand with with rounding the data, as accuracy is guaranteed for a certain number of digits, I guess similar to this: https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_quantize.html

I am not sure how this would work with zlib, if it is either lossy vs lossless, or that a combination can be used?

Huite · 2024-05-30T14:28:15Z

It looks a bit like a breadcrumbs trail to be honest, as xarray doesn't just provide an overview -- which is reasonable, since it depends on what's available in the netCDF4 / HDF5 binaries.

The relevant netCDF4-python docs: https://unidata.github.io/netcdf4-python/#efficient-compression-of-netcdf-variables:

zlib compression is always available, szip is available if the linked HDF5 library supports it, and zstd, bzip2, blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are available via optional external plugins.

For hydromt, you can safely assume that the binary origin is conda-forge so whatever plugins are compiled there are relevant.

More info is probably only available on the netCDF docs directly, among them quantizing: https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_quantize.html

Pretty confident that zlib is lossless.

Best approach IMO is to setup a new pixi env, see which schemes work, and make some examples. Would be useful documentation anyway!

hboisgon added enhancement New feature or request needs refinement issue still needs refinement labels Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Round data in staticmaps.nc #231

Round data in staticmaps.nc #231

hboisgon commented Jan 22, 2024

Huite commented Jan 22, 2024

shartgring commented May 30, 2024

Huite commented May 30, 2024

Round data in staticmaps.nc #231

Round data in staticmaps.nc #231

Comments

hboisgon commented Jan 22, 2024

Kind of request

Enhancement Description

Use case

Additional Context

Huite commented Jan 22, 2024

shartgring commented May 30, 2024

Huite commented May 30, 2024