This repository contains thin high-level bindings for the HDF5 data format for the Nim programming language. It also provides a wrapper of the full HDF5 C library, importable using:
import nimhdf5/hdf5_wrapper
The raw wrapper dynamically links the libhdf5.so (the main library) and libhdf5_hl.so (a library containing high-level convenience functions) libraries at runtime. All public functions of the two libraries are callable by their corresponding C names and C arguments. That means the Nim datatypes need to be manually cast to the corresponding compatible types, if only the wrapper is used.
The high-level bindings, while covering most general HDF5 features, are still in a rough state, due to limited testing by actual users. Most features should work fine (at least using linux), but some known bugs are still there. See examples/h5_high_level_example.nim as an overview (soon to be cleaned up, split and put into a tutorial form) on the available features and their usage. Also take a look at the tests for simple examples of specific features. For a more indepth example of usage of this library, take a look at:
- https://github.com/Vindaar/TimepixAnalysis/blob/master/InGridDatabase/src/ingridDatabase as an example using HDF5 as a simple database
- https://github.com/Vindaar/TimepixAnalysis/blob/master/Analysis/ingrid/raw_data_manipulation.nim for usage of more advanced features like variable length data, writing hyperslabs, creating hard links, etc. in a bigger project.
The wrapper was built making heavy use of c2nim and the main goal was to have a usable interface for the HDF5 data format. More advanced features (e.g. single writer / multiple reader) were of lower priority. Via the wrapper all features should in principle work, but many have not been tested.
The wrapper is currently tested using Nim version 0.18.0
and the
current devel branch (0.18.1
).
The wrapper is built from HDF5 version 1.10.1
.
Linking against the HDF5 1.8
library is reasonably supported as
well, but requires to use an additional compiler flag for now:
-d:H5_LEGACY
With the recent release of version 1.10.3
/ 1.10.4
, support for
this version currently is available under the
-d:H5_FUTURE
flag. Soon this will become the default version! The
H5Oget/visit_...2
procedures are wrapped under a name without a 2
suffix. These however add a fields
argument. Therefore an overload
is available, which maps the fields
to H5O_INFO_ALL
.
If you compile a nim program using nimhdf5
without any of those
flags and try to run it on a system with a HDF5 shared library of
version 1.10.3
or newer, you will be greeted by:
could not import: H5Oget_info
In that case, add the -d:H5_FUTURE
flag to the compilation command
(or probably add it to your nim.cfg
or config.nims
of the
project).
On the other hand if you try to run such a compiled binary on a system
with a HDF5 library of version 1.8
, you will probably see:
could not import: H5P_LST_FILE_CREATE_g
add the -d:H5_LEGACY
flag.
In case neither of these work, please open an issue!
Currently no checks are done, which compare the library this wrapper
is built upon with the library linking against using some of the
provided HDF5 macros (e.g. H5check, H5get_libversion etc.). The main
reason is explained below. However, as far as the high-level
functionality is concerned at the moment, the only differences arise
in a few constant definitions, whose names slightly changed from 1.8
to 1.10
. This is what the compiler flags sets accordingly.
The HDF5 headers contain macros for many variables, such as
#define H5F_ACC_RDONLY (H5CHECK H5OPEN 0x0000u)
where
#define H5CHECK H5check(),
and
#define H5OPEN H5open(),
i.e. it makes use of C’s comma operator. However, c2nim currently has no support for it. Instead of porting them in some reasonable way, these macros were converted to simple replacements with the values, dropping the calls to H5check() and H5open().
The call to H5check() is currently not used at all. Compiling a Nim
program with this wrapper (based on version 1.10.1
) would normally
fail to check against the linked library, if that version is different.
As H5open() is important, the calls are replaced by a single call to initialize the library at the beginning upon the first call of the library via src/nimhdf5/wrapper/H5niminitialize.nim.
As HDF5 is a very macro heavy library, other important macros may not have been correctly wrapped to Nim, e.g. determination of correct sizes of data types. This may cause some weird side-effects (to be fair, I haven’t noticed any!).
Additionally, Windows support is unknown at this time. The library
name is correctly set for Windows, however an additional header file
H5FDwindows.h
might have to be wraped.
Installation can either be done via nimble:
nimble install nimhdf5
or manually by cloning this git repository:
git clone https://github.com/vindaar/nimhdf5
in a folder of your choice and call nimble install afterwards:
cd nimhdf5
nimble install
Or simply make use of nimble’s Github interfacing capabilities:
nimble install https://github.com/vindaar/nimhdf5
The folder c_headers contains the modified HDF5 headers in the state they were in for a successful c2nim conversion. In some cases the C header file had to be modified, in others modification to the resulting .nim file was still necessary.
The folder examples contains the basic HDF5 C examples (see here: https://support.hdfgroup.org/HDF5/examples/intro.html#c) converted to Nim utilizing the wrapper.
h5_high_level_example.nim serves as a replacement for a tutorial for now (tutorial will be added soon!), showcasing (almost) all available features and their usage.
The high level bindings come with several quirks which are good to know.
- when reading back a dataset with dimension > 1, the returned data is
returned in a flat
seq
, instead of e.g. a nestedseq[seq[<type>]]
as one might expect. To get the data in the correct shape, use thereshape
or (reshape2D
,reshape3D
) procs fromutil.nim
. See the example file or the following tests: tutil.nim, treshape.nim for the usage. The exception is variable length data in case of a 1D dataset containing seqs of varying sizes. Here a nested seq of the correct elements is returned. - when grabbing a group or dataset from a H5FileObj via
[](name: string)
, a conversion of the string to a distinctstring
typegrp_str
ordset_str
is used to provide a uniform interface for both from a file object. - 1D datasets do not have shape
(N, )
as one would see in Python, but are represented by(N, 1)
instead. - and many more
- groups
- creating (nested) groups
- iterating over groups (recursively)
- datasets
- writing / reading static sized N-D arrays of any type
- writing / reading variable length data
- chunked storage
- data types:
- any basic nim type, that is:
- SomeNumber (all ints and floats)
- string (not for datasets atm)
- compound datatypes of objects / tuples, where the fields have to be of the above mentioned basic types.
- any basic nim type, that is:
- hyperslabs
- writing / reading hyperslabs using H5 notation
- compression / filters
- zlib compression
- szip compression
- blosc compression (external)
User needs to compile / install:
Note: Windows / OSX not yet supported, due to wrong name of
libblosc.so
in blosc.nim#L6. Change it appropriately. - sort of soon: fletcher32, shuffle, nbits
- attributes
- writing / reading on datasets, groups
- all types supported
- basic types (int, float, …)
- seqs of basic types
- strings
- reading variable length strings (different from static length strings in H5 attributes!)
- hardlink datasets and groups within a file
- iterators over:
- groups
- datasets
- attributes
- Single Writer Multiple Reader (SWMR). See for more info below.
This wrapper fully supports the Single Writer Multiple Reader feature of the HDF5 library, but it is still in an experimental state, as I’ve never really needed it.
It allows to access a single HDF5 file from multiple threads or processes, where one of these is a writer process and all others are readers. When using this feature the user does not have to worry about locks etc. between the different processes.
Open an HDF5 file in write mode and hand the swmr
flag:
# writer.nim
import nimhdf5
var h5f = H5open("/tmp/test.h5", "rw", swmr = true)
# do writing stuff
and in all reader threads / processes, simply do the same, but do not hand a write:
# reader.nim
import nimhdf5
var h5f = H5open("/tmp/test.h5", "r", swmr = true)
This should be all that is required.
I’m not sure if the writer process should make sure to flush the file regularly or not. Feel free to tell me if you know. :)
An alternative to the above for the writer process is to first open
the file in write mode without swmr = true
and then later put it
into swmr
mode via:
import nimhdf5
var h5f = H5open("/tmp/test.h5", "rw")
# do some regular stuff
# and then activate SWMR later
h5f.activateSWMR()
This wrapper can also be used with a HDF5 library that was compiled
with the --enable-threadsafe
compilation flag.
Once the library has been compiled with it, in principle the user can try to open a single file in write mode from multiple processes or threads. For safe handling in these contexts, it may be up to the user to lock access to the file / writing to individual datasets via some locking mechanism. In principle the threadsafe option of the library adds its own mutex logic, so in theory it should work without them.
See these notes about the threadsafe library: https://support.hdfgroup.org/HDF5/doc/TechNotes/ThreadSafeLibrary.html
One issue a user might encounter is that the second opening of a file
yields an error saying that the resource is temporarily unavailable.
In HDF5 version starting from 1.10
, file locking was added as a
feature.
This behavior is controlled via an environment variable:
export HDF5_USE_FILE_LOCKING=FALSE
If set to false, file locking is disabled. With it multiple processes may open the same file.
When doing this, keep in mind that each thread / process will receive
their own FileID
. Some HDF5 functions may either give the user
information based on the specific file ID and others based on the
actual file. In the cases where one can choose, it is supported via
the okLocal
(ObjectKind
enum) or fkLocal
(FlushKind
enum).
Relevant part of the documentation: https://docs.hdfgroup.org/hdf5/develop/_h5public_8h.html#title31
To use blosc as a filter you need to import:
import nimhdf5/blosc
Before v0.3.12
this was done automatically if the nblosc
library
is installed.