Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix h5repack for variable-length datatyped datasets (#3331) #3333

Merged
merged 1 commit into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion release_docs/RELEASE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,21 @@ Bug Fixes since HDF5-1.14.1 release

Tools
-----
-
- Fixed an issue in h5repack for variable-length typed datasets

When repacking datasets into a new file, h5repack tries to determines whether
it can use H5Ocopy to copy each dataset into the new file, or if it needs to
manually re-create the dataset, then read data from the old dataset and write
it to the new dataset. H5repack was previously using H5Ocopy for datasets with
variable-length datatypes, but this can be problematic if the global heap
addresses involved do not match exactly between the old and new files. These
addresses could change for a variety of reasons, such as the command-line options
provided to h5repack, how h5repack allocate space in the repacked file, etc.
Since H5Ocopy does not currently perform any translation when these addresses
change, datasets that were repacked with H5Ocopy could become unreadable in the
new file. H5repack has been fixed to repack variable-length typed datasets without
using H5Ocopy to ensure that the new datasets always have the correct global heap
addresses.


Performance
Expand Down
13 changes: 12 additions & 1 deletion tools/src/h5repack/h5repack_copy.c
Original file line number Diff line number Diff line change
Expand Up @@ -653,6 +653,7 @@ do_copy_objects(hid_t fidin, hid_t fidout, trav_table_t *travt, pack_opt_t *opti
int ifil;
int is_ref = 0;
htri_t is_named;
htri_t is_vlen = 0;
hbool_t limit_maxdims;
hsize_t size_dset;
int ret_value = 0;
Expand Down Expand Up @@ -806,6 +807,10 @@ do_copy_objects(hid_t fidin, hid_t fidout, trav_table_t *travt, pack_opt_t *opti
if (H5T_REFERENCE == H5Tget_class(ftype_id))
is_ref = 1;

/* early detection of variable-length types */
if ((is_vlen = H5Tdetect_class(ftype_id, H5T_VLEN)) < 0)
H5TOOLS_GOTO_ERROR((-1), "H5Tdetect_class failed");

/* Check if the datatype is committed */
if ((is_named = H5Tcommitted(ftype_id)) < 0)
H5TOOLS_GOTO_ERROR((-1), "H5Tcommitted failed");
Expand All @@ -823,10 +828,16 @@ do_copy_objects(hid_t fidin, hid_t fidout, trav_table_t *travt, pack_opt_t *opti
* check if we should use H5Ocopy or not
* if there is a request for filters/layout, we read/write the object
* otherwise we do a copy using H5Ocopy
*
* Note that H5Ocopy is currently unsafe to use for objects that reside in
* or interact with global heaps, such as variable-length datatypes. This
* appears to be due to H5Ocopy not correctly translating in the case where
* these objects move to different global heap addresses in the repacked
* file.
*-------------------------------------------------------------------------
*/
use_h5ocopy = !(options->op_tbl->nelems || options->all_filter == 1 ||
options->all_layout == 1 || is_ref || is_named);
options->all_layout == 1 || is_ref || is_vlen || is_named);

/*
* Check if we are using different source and destination VOL connectors.
Expand Down