Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 Dimension Scales #1313

Open
axelboc opened this issue Dec 6, 2022 · 4 comments
Open

HDF5 Dimension Scales #1313

axelboc opened this issue Dec 6, 2022 · 4 comments
Labels
epic Issue that will need to be split up later on

Comments

@axelboc
Copy link
Contributor

axelboc commented Dec 6, 2022

It would be nice to support HDF5's dimension scales. We've received multiple feature requests relating to those, including in a couple of recent emails. Dimension scales are apparently used quite extensively in NetCDF4 files, to describe how to plot datasets (axes, units, etc.).

In his email, Jean-Christophe describes, for instance, how to create a time-based dimension scale (which could then be "attached" to a dataset - e.g. values.dims[0].attach_scale(time)).

df_modified = pd.to_datetime(df.index.values) - time # Difference of time from a given date
df_modified_str = df_modified.total_seconds().to_numpy()
time_dset = group.create_dataset ('time', data=df_modified_str)
time_dset.attrs["long_name"] = "UTC Time"
time_dset.attrs["description"] = ModelHDFLevel4Lumina.TIME.__doc__    
time_dset.attrs["calendar"] = "standard"
time_dset.attrs["units"] = f"seconds since {time.strftime('%Y-%m-%d %H:%M:%S')}"
time_dset.make_scale('time')

More reading:

@axelboc
Copy link
Contributor Author

axelboc commented Nov 7, 2023

[email protected] brings support for reading dimension scales.

@zhqrbitee
Copy link

Hi teams, we have some applications that stores a waveform (time array + value array) and usually the time array is not uniform sampled to reduce waveform size. It would be nice to have this feature in if we want to visualize the waveform.

@axelboc
Copy link
Contributor Author

axelboc commented Oct 23, 2024

@zhqrbitee any chance you could share a sample file with us?

@NAThompson
Copy link

@axelboc : Here is code which produces an HDF5 file which uses a dimension scale to link two datasets, giving the requisitie metadata to matplotlib so that it understand the datastructure is (times, values) and should be plotted as such:

#!/usr/bin/env python3

import h5py
import numpy
import matplotlib.pyplot as plt
from math import pi as π

def chirp(t: float):
    f0 = 1e4
    c = 3e8
    φ0 = 0.0
    return numpy.sin(φ0 + 2 * π * (c * t * t / 2 + f0 * t))

def create_nonuniform_timeseries():
    # N.B.: This is to emulate a more realistic goal (plotting the output of an adaptive ODE stepper)
    # without a huge amount of code:
    times = numpy.random.uniform(0.0, 1e-3/2, 10000)
    times = numpy.sort(times)
    values = chirp(times)

    with h5py.File('chirp.h5', 'w') as f:
        # Create the time dataset and add dimension scale and units
        time_ds = f.create_dataset('times', data=times)
        time_ds.attrs['units'] = 'seconds'
        time_ds.make_scale('times')

        # Create the values dataset, and attach the time dataset as its dimension scale
        values_ds = f.create_dataset('values', data=values)
        values_ds.attrs['units'] = 'dimensionless'
        values_ds.dims[0].attach_scale(time_ds)

        print(f"Created 'chirp.h5' with times and values datasets.")

def read_and_plot_timeseries(filename='chirp.h5'):
    with h5py.File(filename, 'r') as f:
        # Find the 'values' dataset and check its dimension scale
        values_ds = f['values']
        values = values_ds[:]

        # Iterate through attached scales to find the time dataset
        scales = values_ds.dims[0]
        for scale in scales:
            time_ds = f[scale]  # Get the time dataset by its name
            times = time_ds[:]  # Read the time values

            # Now plot the data
            plt.figure(figsize=(10, 6))
            plt.plot(times, values, label='Chirp Signal')
            plt.xlabel(f'Time ({time_ds.attrs['units']})')
            plt.ylabel(f'Values ({values_ds.attrs['units']})')
            plt.title('Chirp Signal vs. Time')
            plt.grid(True)
            plt.legend()
            plt.tight_layout()
            plt.show()
            break
        else:
            print("No dimension scale found for the 'values' dataset.")



if __name__ == '__main__':
    create_nonuniform_timeseries()
    read_and_plot_timeseries()

Running h5dump chirp.h5 demonstrates that this file does indeed have the desired metadata:

times
...
...
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_U32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): {
               DATASET 105553182819040 "/values",
               0
            }
         }
      }
...
values:
...
...
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT } }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): (DATASET 105553182834848 "/times")
         }
      }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that will need to be split up later on
Projects
None yet
Development

No branches or pull requests

3 participants