You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
loads a dataset saved using engine="h5ncetdf" with a string coordinate say <U2
merges it with another dataset which matches but has longer strings in the same coordinate, say <U4
then saves that merged dataset using engine="h5ncetdf"
then the encoding from loading the initial dataset, which survives the merge, causes the dataset variable to be silently truncated back to "<U2", such that when it is loaded again the data is incorrect.
This is specific to the "h5netcdf" engine. This doesn't happen however with the "scipy" engine.
What did you expect to happen?
I guess the encoding should be dropped or updated during the merge call.
jcmgray
changed the title
merging and saving h5netcdf loaded datasets can lead to string truncation
merging and saving loaded datasets can lead to string truncation
Nov 12, 2024
Hi @kmuehlbauer, thanks for the response. Apologies for missing the prior discussion around encoding - indeed simply dropping the encoding works perfectly for me, feel free to close.
For what its worth my thoughts behavior/docs-wise (as a user who hasn't needed think about encoding before):
having it dropped automatically would make sense to me
a warning that data truncation is happening on write might be nice (it was quite hard to pin down exactly where this was happening!)
similarly, it might be good to warn in the docs that if encoding and data-type get out of sync it can lead to truncation of data
What happened?
If one:
engine="h5ncetdf"
with a string coordinate say<U2
<U4
engine="h5ncetdf"
This is specific to the "h5netcdf" engine.This doesn't happen however with the "scipy" engine.What did you expect to happen?
I guess the encoding should be dropped or updated during the
merge
call.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: None
xarray: 2024.10.0
pandas: 2.2.3
numpy: 2.0.2
scipy: 1.14.1
netCDF4: None
pydap: None
h5netcdf: 1.4.0
h5py: 3.11.0
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.10.0
distributed: 2024.10.0
matplotlib: 3.9.2
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: None
sparse: 0.15.4
flox: None
numpy_groupies: None
setuptools: 75.1.0
pip: 24.2
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.28.0
sphinx: 8.1.3
The text was updated successfully, but these errors were encountered: