Streamed writes #9

jasonGranholt · 2022-02-07T14:18:55Z

Hi Brian, as mentioned by mail earlier, we are working on streamed writes. We need it to convert large binary files in a web app with memory limitations.

We have already made a version with python, where we are parsing data blocks. But we are planning to use Javascript/Typescipt.

Attached you can find our example of using the NETCDF4 library. We append new data from the stream, extending the netcdf4 file. See attached source code. Most of the logic is in main(), but some other functions are included.
createNetCDF4.zip

Kind Regards, Jason

bmaranville · 2022-02-07T15:03:13Z

This is certainly doable - I don't have time to work on it right now, but these steps would be required:

add chunk parameters to Group.create_dataset (you would set the chunk size to match the chunks coming from your conversion function)
modify the C++ code to use those parameters when creating a dataset
add compression parameters to Group.create_dataset (so you don't use up all your memory storing the uncompressed data in the HDF5 file, which lives in memory in the browser...)
modify the C++ code to use those compression parameters when creating a dataset (ZLIB compression is available built-in)
add a Dataset.write(data: Array | ArrayBufferView, slice: Array<Array<number>>) function
add a corresponding function in the C++ code. Creating a slice selection is already done for read, so it should be straightforward to create similar code for write.

It is certainly preferable to do this, knowing how much data will be stored in the HDF5 dataset. If that is not known ahead of time, then additionally:

add maxsize parameter to dataset creation, both in javascript and C++
add Dataset.resize method in javascript and C++

Your conversion program would then write the data a chunk at a time (by choosing an appropriate slice that corresponds to a chunk size with whole-chunk offset)

bmaranville · 2022-02-07T15:17:51Z

I can't tell based on a quick reading of what you sent, but if your datasets use Compound datatypes the ability to initialize and write those would also have to be added to h5wasm (currently it can read, but not write Compound types)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamed writes #9

Streamed writes #9

jasonGranholt commented Feb 7, 2022 •

edited

Loading

bmaranville commented Feb 7, 2022 •

edited

Loading

bmaranville commented Feb 7, 2022

Streamed writes #9

Streamed writes #9

Comments

jasonGranholt commented Feb 7, 2022 • edited Loading

bmaranville commented Feb 7, 2022 • edited Loading

bmaranville commented Feb 7, 2022

jasonGranholt commented Feb 7, 2022 •

edited

Loading

bmaranville commented Feb 7, 2022 •

edited

Loading