Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamed writes #9

Open
jasonGranholt opened this issue Feb 7, 2022 · 2 comments
Open

Streamed writes #9

jasonGranholt opened this issue Feb 7, 2022 · 2 comments

Comments

@jasonGranholt
Copy link

jasonGranholt commented Feb 7, 2022

Hi Brian, as mentioned by mail earlier, we are working on streamed writes. We need it to convert large binary files in a web app with memory limitations.

We have already made a version with python, where we are parsing data blocks. But we are planning to use Javascript/Typescipt.

Attached you can find our example of using the NETCDF4 library. We append new data from the stream, extending the netcdf4 file. See attached source code. Most of the logic is in main(), but some other functions are included.
createNetCDF4.zip

Kind Regards, Jason

@bmaranville
Copy link
Member

bmaranville commented Feb 7, 2022

This is certainly doable - I don't have time to work on it right now, but these steps would be required:

  • add chunk parameters to Group.create_dataset (you would set the chunk size to match the chunks coming from your conversion function)
  • modify the C++ code to use those parameters when creating a dataset
  • add compression parameters to Group.create_dataset (so you don't use up all your memory storing the uncompressed data in the HDF5 file, which lives in memory in the browser...)
  • modify the C++ code to use those compression parameters when creating a dataset (ZLIB compression is available built-in)
  • add a Dataset.write(data: Array | ArrayBufferView, slice: Array<Array<number>>) function
  • add a corresponding function in the C++ code. Creating a slice selection is already done for read, so it should be straightforward to create similar code for write.

It is certainly preferable to do this, knowing how much data will be stored in the HDF5 dataset. If that is not known ahead of time, then additionally:

  • add maxsize parameter to dataset creation, both in javascript and C++
  • add Dataset.resize method in javascript and C++

Your conversion program would then write the data a chunk at a time (by choosing an appropriate slice that corresponds to a chunk size with whole-chunk offset)

@bmaranville
Copy link
Member

I can't tell based on a quick reading of what you sent, but if your datasets use Compound datatypes the ability to initialize and write those would also have to be added to h5wasm (currently it can read, but not write Compound types)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants