Skip to content

Latest commit

 

History

History
1306 lines (1058 loc) · 35.4 KB

hdf5-intro.org

File metadata and controls

1306 lines (1058 loc) · 35.4 KB

HDF5 Introduction

Outline

To understand HDF5 is more than a technical exercise.

HDF5 wasn’t designed by committee. It’s the result of an evolutionary process.

img/outline.png

HDF5 is what it is because of how different communities use it to solve data management problems.

We will tell you a little bit about ourselves (The HDF Group), the stewards and maintainers of HDF5 software and technologies.

Last but not least, HDF5 couldn’t endure and thrive without an ecosystem.

The HDF Group

A non-profit company on a mission:

To make the management of large & complex data as simple as possible, but no simpler.

Our values:

  • Simple data model
  • Free Open Source Software
  • Technical excellence
  • Diverse community

We were excited about big data 20 years ago and welcome the new generation of “dataphiles.” Today, we are no less excited about big data, but also about the new processing and storage capabilities and economies.

We are working hard to bring the associated benefits to our users and customers while protecting their current investment.

Our biggest challenge: HDF spreads through word-of-mouth and not marketing. Spread the word!

The HDF Community

It’s hard to find a walk of life not touched by HDF5. (Maybe literary studies?)

  • Public and private sector
  • Academia, government, industry
  • Organizations (NASA, DOE, CEA, ITER, DESY)
  • Standards bodies (OGC, EXPRESS, Allotrope, CGNS, RESQML)
  • Sectors (aerospace, oil & gas, pharma, power systems, finance)
  • Research (astronomy, life science, materials science, ML)
  • Data integrators, life cycle managers
  • Developers

We like to speak of the HDF5 communtiy as an iceberg, of which we see only about 10%. Despite our best efforts, we usally find out that someone is using our software “by accident.” That leaves plenty of room for interpretation:

  • People don’t know of our existence (we have some evidence of that)
  • Our product is perfect and there’s no need for any further communication (we know better)
  • We are such terrible people that nobody wants to talk to us (perhaps)

Starting Point

  • “Raw data” is an oxymoron (It needs an domain-specific interpretive base.)
  • Data comes in many representations: in registers, memory, storage…
  • Some data is used as metadata for other data.
  • Data can be “at rest” (files, objects) or transit environments (I/O)

Data management is the art of:

  1. Maintaining data integrity
  2. Leaving clues for the interpretive base
  3. Removing “data friction”

Many challenges around data management are not exactly new. Big Data’s three Vs (volume, velocity, variety) turn what used to be mere inconvenience into major pain points.

H - D - F - 5

Historically & literally: HDF5 = H ierarchical D ata F ormat v 5

The meaning of a word is its use in the language. (Wittgenstein)

Depending on your use HDF5 is:

  1. A generic (=not specific) data management interface with a bent toward multi-dimensional arrays
    • HDF5 sits on the periphery
    • You get some of the benefits, e.g., portability
  2. A data management interface toolkit.
    • HDF5 is part of the fabric
    • You create domain-specific interfaces to manage self-describing data representations
    • You get a streamlined data life cycle

HDF5-based interfaces often surpass other solutions in power, sophistication, simplicity, performance, and portability.

Hierarchical

Items are presented in a hierarchically structured name space

img/hdf5-container.png

Data

Complex data & Metadata

  • (Complex = consisting of interconnected or interwoven parts, composite)
  • Physical experiments & observational data
  • Simulations
  • Values of function evaluations
  • Streams

Data is “caught” in a web of objects. Links and nodes can be pre-defined or user-defined items.

img/technical-detail.png

Format

An arrangement of data such that it can be processed or stored by a computer

Many different layouts exist for such HDF5 containers:

  • Single file
  • Multiple files
  • Memory buffers
  • Collection of objects in Intel DAOS or Amazon S3

And you can create your own layouts!

(Version) 5

We’ve tried (in versions 1,2,3, 4) to make all original mistakes for you!

Hello, HDF5!

The introduction to Andrew Collette’s book

./img/Python_and_HDF5.png

begins with an intuitive example.

Suppose we have multiple weather stations for which we would like to record temperature and wind speed time series. Let’s also assume that we acquire those samples at fixed sampling intervals.

This snippet of Python code shows how to arrange measurements from multiple stations in a single HDF5 container and how to capture important metadata such as platform characteristics and sampling rates.

import h5py, numpy as np, platform as pfm

# Weather stations record temperatures and wind speeds

with h5py.File('hello.hdf5', 'w') as f:
    f.attrs['system'] = pfm.system();
    f.attrs['release'] = pfm.release();
    f.attrs['processor'] = pfm.processor();

    # station ID 15
    temperature = np.random.random(1024)
    dt = 10.0   # Temperature sampled every 10 seconds
    wind = np.random.random(2048)
    dt_wind = 5.0   # Wind speed sampled every 5 seconds
    f['/15/temperature'] = temperature
    f['/15/temperature'].attrs['dt'] = dt
    f['/15/wind'] = wind
    f['/15/wind'].attrs['dt'] = dt_wind
    # station 20
    # f["/20/..."] = ...

from pathlib import Path
print('File size: {} bytes'.format(Path('hello.hdf5').stat().st_size))
File size: 32768 bytes

After running the example, we have an HDF5 file containing temperature and wind speed time series from one or more weather stations.

Data Model

Judging from our Hello, HDF5! example, we are dealing with nested groupings of arrays. There’s one grouping for each weather station and there are two array variables (temperature and wind) per grouping. This is almost accurate, except that all weather station groupings are part of the so-called root group, and that there are additional decorations (system characteristics, sampling rates).

img/hello-hdf5.png

For the purpose of this introduction, it’s OK to think of an HDF5 container as a file system in a file. (Of course, as long as there is a file…)

Speaking informally, the HDF5 data model includes two primitives and a set of combination rules. HDF5 is about describing array variables and their relationships.

img/hdf5-primitives.png

(Datasets and attributes are roles in which array variables can be used in HDF5, and different rules apply to them.)

People often miss the simplicity of the HDF5 data model for irrelevant technical details. HDF5 represents data as (values of) variables and relationships among them. Isn’t that how mathematical models work? If there is any “secret sauce” to HDF5, this is it, and it’s hidden in plain sight.

Ecosystem

HDF5 couldn’t survive without users, without CONTRIBUTORS, or without an ecosystem. Unless you are writing C code all the time (condolences!), it’s very likely that you are benefitting from the work of someone who doesn’t work for The HDF Group. Support us to support them, or support them directly!

Language bindings

  • It’s hard to find a language for which there are no HDF5 bindings or an API
  • The HDF Group develops and maintains a reference implementation in C and bindings for Fortran
  • The community provides excellent bindings for Python, R, C++, Julia, .NET, Java…
  • Third parties support HDF5 in their products, e.g., MathWorks, National Instruments, Wolfram, etc.

IoT / Edge

Download a fully featured Python 3 IDE to your mobile device from the Google Play store.

./img/Pydroid3.png

Homework: Run the "Hello, HDF5!" example on your phone and look at the file on your computer!

HPC (aka. Parallel HDF5) <sec:hpc>

img/phdf5.png

Below we illustrate how to transition from a single process writing to a dataset to multiple MPI-processes writing to different parts of a single dataset in a single shared HDF5 file. The code is written to emphasize similarities and to highlight the few places where they differ. The common portions are shown in the appendix.

A single process writing to a single dataset

The basic flow is as follows:

  1. Create an HDF5 file (line (seq-fcrt))
  2. Create an HDF5 dataset (line (seq-dcrt))
  3. Select the destination in the file (line (seq-sel))
  4. Write a data buffer (line (seq-wrt))
#include "literate-hdf5.h"
#define SIZE 1024*1024

int main(int argc, char** argv)
{
  hid_t fapl, file, dset, file_space;
  float* buffer;
  hsize_t file_size;

  fapl = H5Pcreate(H5P_FILE_ACCESS);
  file = H5Fcreate("single-proc.h5", H5F_ACC_TRUNC, H5P_DEFAULT,
                   fapl); // (ref:seq-fcrt)

  dset = (*
          <<make-dataset>>) (file, "1Mi-floats", SIZE); // (ref:seq-dcrt)
  file_space = H5Dget_space(dset);
  H5Sselect_all(file_space);  // (ref:seq-sel)

  <<create-buffer-and-write>> // (ref:seq-wrt)

  <<clean-up>>
}
h5dump -pBH single-proc.h5
HDF5 "single-proc.h5" {
SUPER_BLOCK {
   SUPERBLOCK_VERSION 0
   FREELIST_VERSION 0
   SYMBOLTABLE_VERSION 0
   OBJECTHEADER_VERSION 0
   OFFSET_SIZE 8
   LENGTH_SIZE 8
   BTREE_RANK 16
   BTREE_LEAF 4
   ISTORE_K 32
   FILE_SPACE_STRATEGY H5F_FSPACE_STRATEGY_FSM_AGGR
   FREE_SPACE_PERSIST FALSE
   FREE_SPACE_SECTION_THRESHOLD 1
   FILE_SPACE_PAGE_SIZE 4096
   USER_BLOCK {
      USERBLOCK_SIZE 0
   }
}
GROUP "/" {
   DATASET "1Mi-floats" {
      DATATYPE  H5T_IEEE_F32LE
      DATASPACE  SIMPLE { ( 1048576 ) / ( 1048576 ) }
      STORAGE_LAYOUT {
         CONTIGUOUS
         SIZE 4194304
         OFFSET 2048
      }
      FILTERS {
         NONE
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_IFSET
         VALUE  H5D_FILL_VALUE_DEFAULT
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_LATE
      }
   }
}
}

Multiple MPI processes writing to a single dataset in a shared file

The basic flow is exactly the same as in the sequential case:

  1. Create an HDF5 file (line (par-fcrt))
  2. Create an HDF5 dataset (line (par-dcrt))
  3. Select the destination in the file (lines (par-sel1) - (par-sel2))
  4. Write a data buffer (line (par-wrt))

There are only two differences between the sequential case and the MPI-parallel case:

  1. We have to instruct the HDF5 library to use MPI-IO layer (line (par-fapl))
  2. Since the data buffers from different MPI ranks are destined for different “offsets” in the dataset, the selection process is rank dependent (lines (par-sel1) - (par-sel2))

That’s it. Everything else is the same. Most importantly:

The is only _one_ HDF5 file format.

It is impossible to tell if a given HDF5 file was created by a sequential or parallel application.

Notice that the example is a case of weak scaling: each process writes the same amount of data, and the total amount of data written is proportional to the number of processes. (We speak of strong scaling when the total amount of data written is kept constant, independent of the number of writing MPI processes.)

#include "literate-hdf5.h"
#define SIZE 1024*1024

int main(int argc, char** argv)
{
  int size, rank;
  <<mpi-boilerplate>>

  {
    hid_t fapl, file, dset, file_space;
    float* buffer;
    hsize_t file_size;

    fapl = H5Pcreate(H5P_FILE_ACCESS);
    H5Pset_fapl_mpio(fapl, MPI_COMM_WORLD, MPI_INFO_NULL); // (ref:par-fapl)
    file = H5Fcreate("multi-proc.h5", H5F_ACC_TRUNC, H5P_DEFAULT,
                     fapl); // (ref:par-fcrt)

    dset = (*
            <<make-dataset>>) (file, "xMi-floats", size*SIZE); // (ref:par-dcrt)
    file_space = H5Dget_space(dset);
    { // (ref:par-sel1)
      hsize_t start = rank*SIZE, count = 1, block = SIZE;
      H5Sselect_hyperslab(file_space, H5S_SELECT_SET,
                          &start, NULL, &count, &block);
    } // (ref:par-sel2)

    <<create-buffer-and-write>> // (ref:par-wrt)

    <<clean-up>>
  }

  MPI_Finalize(); // (ref:par-mpi-shutdown)
}

Cloud

The best known example is the Highly Scalable Data Service (HSDS). See John Readey’s presentation.

CAUTION: To work with HDF5 in cloud-based environments means different things to different audiences. Without context, it means just this:

img/cloud-hdf5.png

A Simple Data Analysis

Reference: Reproducible Research with GNU Emacs and Org-mode by Thibault Lestang

The following stochastic differential equation describes a 1D Ornstein-Uhlenbeck process:

\begin{equation} \mathrm{d}x_t = -μ x_t + \sqrt{2D}\mathrm{d}W_t \end{equation}

$μ &gt; 0$ and $D &gt; 0$ are parameters and $W_t$ denotes the Wiener process.

Simulation

A sample trajectory of the stochastic process can be approximated with a snippet of C++ code.

std::default_random_engine generator;
std::normal_distribution<> distribution{0.0, 1.0};

double dt = 0.1, mu = 0.0, D = 0.5;

double x = 0.0;

for (unsigned i = 0; i < 100; ++i)
  {
    auto t = i*dt;
    auto dw = distribution(generator);
    x += (mu - x)*dt + sqrt(2.*D)*dw;
    std::cout << t << " " << x << std::endl;
  }

Visualization<sec:viz>

import numpy as np, matplotlib.pyplot as plt

timeseries = np.array(timeseries)
fig = plt.figure()
plt.plot(timeseries[:,0], timeseries[:,1])
plt.subplot(111).set_xlabel('t')
plt.subplot(111).set_ylabel('x')
plt.savefig('timeseries_vis.png')
return 'timeseries_vis.png'

img/timeseries_vis.png

Statistics

The following function computes the sample mean.

from numpy import array, mean
values = array(x)[:,1]
return mean(values)

After a long and complicated statistical analysis, we conclude that the sample average is call_mean(initial_data) {{{results(-0.023857982999999992)}}}.

Storing the sample trajectory

The following snippet stores our sample as a 100 x 2 2D array.

import h5py, numpy as np

with h5py.File('hello.hdf5', 'a') as f:
    f['t_x'] = np.array(x)
    return 'SUCCESS'
SUCCESS
h5dump -H -d t_x hello.hdf5
HDF5 "hello.hdf5" {
DATASET "t_x" {
   DATATYPE  H5T_IEEE_F64LE
   DATASPACE  SIMPLE { ( 100, 2 ) / ( 100, 2 ) }
}
}

Except for the name of the dataset t_x, it may not be obvious who’s who.

Field names

The following snippet stores our sample as a 100 element 1D array of a compound datatype.

import h5py, numpy as np

dt = np.dtype([("time", np.double), ("position", np.double)])
a = np.array(x)

with h5py.File('hello.hdf5', 'a') as f:
    f.create_dataset("compound", (100,), dtype=dt)
    f['compound'][:,'time'] = a[:,0]
    f['compound'][:,'position'] = a[:,1]
    return 'SUCCESS'
SUCCESS
h5dump -H -d compound hello.hdf5
HDF5 "hello.hdf5" {
DATASET "compound" {
   DATATYPE  H5T_COMPOUND {
      H5T_IEEE_F64LE "time";
      H5T_IEEE_F64LE "position";
   }
   DATASPACE  SIMPLE { ( 100 ) / ( 100 ) }
}
}

Creating a Self-Contained Package

Setting attributes

Let’s make this container more self-documenting by storing the simulation parameters $\mathrm{d}t$, $D$, and $μ$!

import h5py, numpy as np

with h5py.File('hello.hdf5', 'a') as f:
    dset = f["compound"]
    dset.attrs['dt'] = 0.1
    dset.attrs['D'] = 0.5
    dset.attrs['μ'] = 0.0
h5dump -A -d compound hello.hdf5
HDF5 "hello.hdf5" {
DATASET "compound" {
   DATATYPE  H5T_COMPOUND {
      H5T_IEEE_F64LE "time";
      H5T_IEEE_F64LE "position";
   }
   DATASPACE  SIMPLE { ( 100 ) / ( 100 ) }
   ATTRIBUTE "D" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SCALAR
      DATA {
      (0): 0.5
      }
   }
   ATTRIBUTE "dt" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SCALAR
      DATA {
      (0): 0.1
      }
   }
   ATTRIBUTE "μ" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SCALAR
      DATA {
      (0): 0
      }
   }
}
}

Other good candidates for attributes include physical units, calibrations, RNG seeds, etc.

Image handling

In this section, we focus on raster images. However, the two approaches presented here apply, mutatis mutandis, to vector images.

We can treat images as blobs or byte sequences (see the section Opaque datasets), or we cant treat them as 2D arrays of pixels/color values plus certain metadata, e.g., palette (see the section Annotated 2D datasets). Whichever approach we choose determines how they then can be accessed or manipulated.

Opaque datasets<sec:opaque>

#include "literate-hdf5.h"

int main(int argc, char** argv)
{
  size_t size;
  char* buf = (*
               <<read-image-bytes>>) ("./img/timeseries_vis.png", &size);
  printf("%ld\n", size);

  (*
   <<create-and-write-opaque-dset>>) ("hello.hdf5", "bytes", buf, size);

  free(buf);
  return 0;
}

<<read-image-bytes>>

lambda(char*, (const char* name, size_t* size),
       {
         char* result;
         FILE* fp = fopen(name, "rb");
         fseek(fp, 0L, SEEK_END);
         *size = ftell(fp);
         fseek(fp, 0, SEEK_SET);
         result = (char*) malloc(*size);
         fread(result, size, 1, fp);
         fclose(fp);
         return result;
       })

<<create-and-write-opaque-dset>>

lambda(void,
       (const char* fname, const char* dname,
        const char* buf, size_t size),
       {
         hid_t file = H5Fopen(fname, H5F_ACC_RDWR, H5P_DEFAULT);
         hid_t dtype = H5Tcreate(H5T_OPAQUE, size);
         hid_t dspace = H5Screate(H5S_SCALAR);
         hid_t dset;
         H5Tset_tag(dtype, "image/png");
         dset = H5Dcreate(file, dname, dtype, dspace,
                          H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
         H5Dwrite(dset, dtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf);
         H5Dclose(dset);
         H5Sclose(dspace);
         H5Tclose(dtype);
         H5Fclose(file);
       })

Annotated 2D datasets<sec:transparent>

We use a simple tool gif2h5 to create a dataset representation conforming to the HDF5 image specification. As a sample image, we use the sample trajectory from section Visualization. Unfortunately, the simple gif2h5 tool accepts only GIF images, and we need to first convert the PNG file timeseries_vis.png. ImageMagick to the rescue!

convert -version
Version: ImageMagick 6.9.10-23 Q16 x86_64 20190101 https://imagemagick.org
Copyright: © 1999-2019 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP
Delegates (built-in): bzlib djvu fftw fontconfig freetype heic jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png tiff webp wmf x xml zlib
convert ./img/timeseries_vis.png timeseries_vis.gif

Now we are ready to call gif2h5.

gif2h5 timeseries_vis.gif timeseries_vis.h5

timeseries_vis.h5 contains a 2D dataset of pixels called Image0 and a 2D palette called global.

h5ls -v timeseries_vis.h5
Opened "timeseries_vis.h5" with sec2 driver.
Image0                   Dataset {480/480, 640/640}
    Attribute: CLASS scalar
        Type:      6-byte null-terminated ASCII string
    Attribute: IMAGE_SUBCLASS scalar
        Type:      14-byte null-terminated ASCII string
    Attribute: IMAGE_VERSION scalar
        Type:      4-byte null-terminated ASCII string
    Attribute: PALETTE scalar
        Type:      object reference
    Location:  1:1400
    Links:     1
    Storage:   307200 logical bytes, 307200 allocated bytes, 100.00% utilization
    Type:      native unsigned char
global                   Dataset {128/128, 3/3}
    Attribute: CLASS scalar
        Type:      8-byte null-terminated ASCII string
    Attribute: PAL_VERSION scalar
        Type:      4-byte null-terminated ASCII string
    Location:  1:800
    Links:     1
    Storage:   384 logical bytes, 384 allocated bytes, 100.00% utilization
    Type:      native unsigned char

Use HDFView to look at the image! (TODO Add a screenshot!)

Finally we copy the pixel and palette datasets to hello.hdf5.

h5copy -v -f ref -i timeseries_vis.h5 -s Image0 -o hello.hdf5 -d Image0

Jamming a text file

Finally, we jam this text file (Org file)

h5jam -i hello.hdf5 -u $ublock --clobber
head -n 10 hello.hdf5
#+TITLE: HDF5 Introduction
#+AUTHOR: Gerd Heber
#+EMAIL: [email protected]
#+CREATOR: <a href="http://www.gnu.org/software/emacs/">Emacs</a> 27.1.90 (<a href="http://orgmode.org">Org</a> mode 9.4.4)
#+DATE: [2021-01-01 Fri]
#+OPTIONS: author:t creator:t email:t toc:nil num:nil

#+PROPERTY: header-args :eval never-export

* Outline

It’s still an HDF5 file:

h5ls -vr hello.hdf5
Opened "hello.hdf5" with sec2 driver.
/                        Group
    Attribute: processor scalar
        Type:      variable-length null-terminated UTF-8 string
    Attribute: release scalar
        Type:      variable-length null-terminated UTF-8 string
    Attribute: system scalar
        Type:      variable-length null-terminated UTF-8 string
    Location:  1:96
    Links:     1
/15                      Group
    Location:  1:1344
    Links:     1
/15/temperature          Dataset {1024/1024}
    Attribute: dt scalar
        Type:      native double
    Location:  1:1072
    Links:     1
    Storage:   8192 logical bytes, 8192 allocated bytes, 100.00% utilization
    Type:      native double
/15/wind                 Dataset {2048/2048}
    Attribute: dt scalar
        Type:      native double
    Location:  1:14992
    Links:     1
    Storage:   16384 logical bytes, 16384 allocated bytes, 100.00% utilization
    Type:      native double
/Image0                  Dataset {480/480, 640/640}
    Attribute: CLASS scalar
        Type:      6-byte null-terminated ASCII string
    Attribute: IMAGE_SUBCLASS scalar
        Type:      14-byte null-terminated ASCII string
    Attribute: IMAGE_VERSION scalar
        Type:      4-byte null-terminated ASCII string
    Attribute: PALETTE scalar
        Type:      object reference
    Location:  1:347520
    Links:     1
    Storage:   307200 logical bytes, 307200 allocated bytes, 100.00% utilization
    Type:      native unsigned char
/compound                Dataset {100/100}
    Attribute: D scalar
        Type:      native double
    Attribute: dt scalar
        Type:      native double
    Attribute: \316\274 scalar
        Type:      native double
    Location:  1:36416
    Links:     1
    Storage:   1600 logical bytes, 1600 allocated bytes, 100.00% utilization
    Type:      struct {
                   "time"             +0    native double
                   "position"         +8    native double
               } 16 bytes
/t_x                     Dataset {100/100, 2/2}
    Location:  1:32768
    Links:     1
    Storage:   1600 logical bytes, 1600 allocated bytes, 100.00% utilization
    Type:      native double
/~obj_pointed_by_347888  Dataset {128/128, 3/3}
    Attribute: CLASS scalar
        Type:      8-byte null-terminated ASCII string
    Attribute: PAL_VERSION scalar
        Type:      4-byte null-terminated ASCII string
    Location:  1:347888
    Links:     2
    Storage:   384 logical bytes, 384 allocated bytes, 100.00% utilization
    Type:      native unsigned char
h5dump -BH hello.hdf5

We can extract the so-called user block at the beginning of the file with h5unjam:

h5unjam -i hello.hdf5 -o no-user-block.h5 -u user-block.org
head -n 10 user-block.org

Appendix <sec:appendix>

Boilerplate and common code <sec:boilerplate>

In this section, we provide the common code snippets for the sequential and parallel examples.

<<mpi-boilerplate>>

This typical MPI boilerplate. Each MPI process determines the communicator size and its own rank.

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

<<make-dataset>>

To create a dataset (array variable) we need to specify its shape (line (dsp-crt)) and the datatype of its elements (H5T_IEEE_F32 on line (dst-crt)).

lambda(hid_t, (hid_t file, const char* name, hsize_t elt_count),
       {
         hid_t result;
         hid_t fspace = H5Screate_simple(1, (hsize_t[]) { elt_count },
                                         NULL); // (ref:dsp-crt)
         result = H5Dcreate(file, name, H5T_IEEE_F32LE, fspace,
                            H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); // (ref:dst-crt)
         H5Sclose(fspace);
         return result;
       })

<<create-buffer-and-write>>

We create and initialize the data buffer to be written. Its shape is described by its in-memory dataspace mem_space (line (msp)). Since we are writing the entire buffer, we are selecting all elements (line (msp-sel)).

buffer = (float*) malloc(SIZE*sizeof(float));
{ /* Do something interesting with buffer! */
  size_t i;
  for (i = 0; i < SIZE; ++i)
    buffer[i] = (float) i;
}

{
  hid_t mem_space = H5Screate_simple(1, (hsize_t[]) { SIZE }, NULL); // (ref:msp)
  H5Sselect_all(mem_space); // (ref:msp-sel)

  H5Dwrite(dset, H5T_NATIVE_FLOAT, mem_space, file_space, H5P_DEFAULT,
         buffer);

  H5Sclose(mem_space);
}

<<clean-up>>

Adhering to the HDF5 library’s handle discipline is the A and $Ω$ when working with the C-API. All resources acquired in the course of an application must be released eventually.

H5Pclose(fapl);
free(buffer);
H5Sclose(file_space);
H5Dclose(dset);
H5Fclose(file);