Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend NCZarr to support unlimited dimensions. #2736

Closed
wants to merge 4 commits into from

Conversation

DennisHeimbigner
Copy link
Collaborator

The existing NCZarr extensions to Zarr are modified to support unlimited dimensions. NCzarr extends the Zarr meta-data for the ".zgroup" object to include netcdf-4 model extensions. This information is stored in ".zgroup" as dictionary named "_nczarr_group". Inside "_nczarr_group", there is a key named "dims" that stores information about netcdf-4 named dimensions. The value of "dims" is a dictionary whose keys are the named dimensions. The value associated with each dimension name has one of two forms Form 1 is a special case of form 2, and is kept for backward compatibility. Whenever a new file is written, it uses format 2. The two formats are as follows.

  1. An integer representing the size of the dimension, which is used for simple named dimensions.
  2. A dictionary with the following keys and values"
    • "size" with an integer value representing the (current) size of the dimension.
    • (optional) "unlimited" with a value of either "1" or "0" to indicate if this dimension is an unlimited dimension.

For Unlimited dimensions, the size is initially zero, and as variables extend the length of that dimension, the size value for the dimension increases. That dimension size is shared by all arrays referencing that dimension, so if one array extends an unlimited dimension, it is implicitly extended for all other arrays that reference that dimension. This is the standard semantics for unlimited dimensions.

Related changes.

Adding unlimited dimensions required a number of other changes to the NCZarr code-base. These included the following.

  • Partial refactor of the slice handling code in zwalk.c to clean it up.
  • Add a number of tests for unlimited dimensions derived from the same test in nc_test4.
  • Add several NCZarr specific unlimited tests; more are needed.
  • Add test of endianness.

Misc. Other changes

  • Fixed an obscure memory leak in ncdump.
  • Removed some obsolete unit testing code and test cases.
  • Uncovered a bug in the netcdf-c handling of big-endian floats and doubles. Have not fixed yet. See tst_h5_endians.c.
  • Renamed some nczarr_tests testcases to avoid name conflicts with nc_test4.

The existing NCZarr extensions to Zarr are modified to support unlimited dimensions.
NCzarr extends the Zarr meta-data for the ".zgroup" object to include netcdf-4 model extensions. This information is stored in ".zgroup" as dictionary named "_nczarr_group".
Inside "_nczarr_group", there is a key named "dims" that stores information about netcdf-4 named dimensions. The value of "dims" is a dictionary whose keys are the named dimensions. The value associated with each dimension name has one of two forms
Form 1 is a special case of form 2, and is kept for backward compatibility. Whenever a new file is written, it uses format 2.
1. An integer representing the size of the dimension, which is used for simple named dimensions.
2. A dictionary with the following keys and values"
   * "size" with an integer value representing the (current) size of the dimension.
   * "unlimited" with a value of either "1" or "0" to indicate if this dimension is an unlimited dimension.

For Unlimited dimensions, the size is initially zero, and as variables extend the length of that dimension, the size value for the dimension increases.
That dimension size is shared by all arrays referencing that dimension, so if one array extends an unlimited dimension, it is implicitly extended for all other arrays that reference that dimension.
This is the standard semantics for unlimited dimensions.

## Related changes.
Adding unlimited dimensions required a number of other changes to the NCZarr code-base. These included the following.
* Did a partial refactor of the slice handling code in zwalk.c to clean it up.
* Added a number of tests for unlimited dimensions derived from the same test in nc_test4.
* Added several NCZarr specific unlimited tests; more are needed.
* Add test of endianness.

## Misc. Other changes
* Fixed an obscure memory leak in ncdump.
* Removed some obsolete unit testing code and test cases.
* Uncovered a bug in the netcdf-c handling of big-endian floats and doubles. Have not fixed yet. See tst_h5_endians.c.
* Renamed some nczarr_tests testcases to avoid name conflicts with nc_test4.
@czender
Copy link
Contributor

czender commented Aug 16, 2023

This will be so helpful for making shifting netCDF storage to the cloud...Can't wait to see it merged!

@DennisHeimbigner
Copy link
Collaborator Author

This PR is superceded by #2755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants