Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support viewing JLD2 files #46

Closed
hz-xiaxz opened this issue Aug 21, 2024 · 9 comments · Fixed by #48
Closed

Support viewing JLD2 files #46

hz-xiaxz opened this issue Aug 21, 2024 · 9 comments · Fixed by #48

Comments

@hz-xiaxz
Copy link

Is your feature request related to a problem?

JLD2 is a file type natively created by Julia programming language, and it is heavily used in high performance scientific programs, JLD2

JLD2 is designed as comprising a subset of HDF5, though I'm non-expert in this repository, I think it might be reachable to support JLD2 with likewise interface.

Alternatives you've considered

The authors of JLD2.jl are eagering for a vscode-extension to visualize JLD2 files, see discussions below
https://discourse.julialang.org/t/jld2-preview-in-vscode/80050/6
julia-vscode/julia-vscode#2863

Additional context

Currently opening a JLD2 using H5web gives error message like below, any hint to solve it?

HDF5-DIAG: Error detected in HDF5 (1.14.2) thread 0: #000: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 403 in H5Dopen2(): unable to synchronously open dataset major: Dataset minor: Can't open object #001: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 364 in H5D__open_api_common(): unable to open dataset major: Dataset minor: Can't open object #002: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 1980 in H5VL_dataset_open(): dataset open failed major: Virtual Object Layer minor: Can't open object #003: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 1947 in H5VL__dataset_open(): dataset open failed major: Virtual Object Layer minor: Can't open object #004: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLnative_dataset.c line 321 in H5VL__native_dataset_open(): unable to open dataset major: Dataset minor: Can't open object #005: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dint.c line 1429 in H5D__open_name(): can't open dataset major: Dataset minor: Unable to initialize object #006: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dint.c line 1494 in H5D_open(): not found major: Dataset minor: Object not found #007: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dint.c line 1689 in H5D__open_oid(): unable to load type info from dataset header major: Dataset minor: Unable to initialize object #008: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 432 in H5O_msg_read(): unable to read object header message major: Object header minor: Read failed #009: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 487 in H5O_msg_read_oh(): unable to decode message major: Object header minor: Unable to decode value #010: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.h line 61 in H5O__dtype_shared_decode(): unable to decode shared message major: Object header minor: Unable to decode value #011: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.c line 358 in H5O__shared_decode(): unable to retrieve native message major: Object header minor: Read failed #012: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.c line 172 in H5O__shared_read(): unable to read message major: Object header minor: Read failed #013: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 432 in H5O_msg_read(): unable to read object header message major: Object header minor: Read failed #014: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 487 in H5O_msg_read_oh(): unable to decode message major: Object header minor: Unable to decode value #015: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.h line 74 in H5O__dtype_shared_decode(): unable to decode native message major: Object header minor: Unable to decode value #016: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Odtype.c line 1338 in H5O__dtype_decode(): can't decode type major: Datatype minor: Unable to decode value #017: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Odtype.c line 154 in H5O__dtype_decode_helper(): bad version number for datatype message major: Datatype minor: Unable to load metadata into cache
@mkitti
Copy link

mkitti commented Aug 21, 2024

Do you have a sample file available for testing?

@JonasIsensee
Copy link

JonasIsensee commented Aug 22, 2024

EDIT:
This is actually not really a problem with JLD2.
The h5web viewer cannot cope with committed datatypes. (that are linked from groups)
Demo:

julia> f = h5open("test.h5", "w")
🗂️ HDF5.File: (read-write) test.h5

julia> g = create_group(f, "types")
📂 HDF5.Group: /types (file: test.h5)

julia> t = commit_datatype(f, "types/AStruct", datatype(AStruct))
HDF5.Datatype: /types/AStruct H5T_COMPOUND {
      H5T_STD_I64LE "x" : 0;
      H5T_IEEE_F64LE "y" : 8;
   }
julia> d = create_dataset(f, "data", t, (1,1))
🔢 HDF5.Dataset: /data (file: test.h5 xfer_mode: 0)

julia> d[1,1] = AStruct(1,2)
AStruct(1, 2.0)

julia> close(f)

shell> h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
   DATASET "data" {
      DATATYPE  "/types/AStruct"
      DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
      DATA {
      (0,0): {
            1,
            2
         }
      }
   }
   GROUP "types" {
      DATATYPE "AStruct" H5T_COMPOUND {
         H5T_STD_I64LE "x";
         H5T_IEEE_F64LE "y";
      }
   }
}
}

I also opened an issue here: silx-kit/h5web#1699

@axelboc
Copy link
Contributor

axelboc commented Aug 22, 2024

We will definitely strive to support JLD2 files. Thanks for opening an issue in the H5Web repo. As explained, the problem comes from h5wasm not currently supporting committed datatypes. I've opened an issue on the h5wasm repo too: usnistgov/h5wasm#80

@loichuder
Copy link
Member

Thanks to @bmaranville and @axelboc work, we made good progress towards this. I could read this simple JLD2 file without issue using the main branch of H5Web: example.zip

Generation script
# Inspired by https://github.com/JuliaIO/JLD2.jl?tab=readme-ov-file#jld2
using JLD2

jldsave("example.jld2"; x, y, z)
jldopen("example.jld2", "r+"; compress = true) do f
f["large_array"] = zeros(10000)
end

jldopen("example.jld2", "r+") do file
mygroup = JLD2.Group(file, "mygroup")
mygroup["mystuff"] = 42
end

@JonasIsensee
Copy link

I can confirm!
Though, your example file here does not actually rely on the fix given that it does not use compound datatypes.
The fact that the following file now works is quite impressive:

julia> using JLD2

julia> struct InnerStruct
           x::String
           y::Int
       end

julia> struct OuterStruct
           a::Int
           b::InnerStruct
           c::NTuple{3,Int}
       end

julia> jldsave("nested_compound.jld2"; data=OuterStruct(1, InnerStruct("two",3),(4,5,6)))

nested_compound.zip

@loichuder
Copy link
Member

Thanks for double-checking 🙂

I figured that the file was too simple so I hope to get a more complex example from people there: julia-vscode/julia-vscode#2863 (comment)

@JonasIsensee
Copy link

JonasIsensee commented Sep 10, 2024

Here are two more files which are conceptually interesting:
JLD2 uses h5 references to refer to mutable fields inside structs and tracks object identities while saving and loading.
This allows storing and loading recursive structures as well as obj identity preservation.
In the second example, the field array is encoded and loaded only once. Both struct fields refer to the same memory after loading.

H5Web currently gives you no way to view / de-reference linked datasets in this way.
That avoids all the potential pitfalls of circular references but also prevents viewing some of the data encoded in JLD2.
Importantly though, it does not error even with files like this.

In my eyes, this is good enough for a release at this stage. Future feature ideas might be enhanced pretty printing for compound types and enabling the manual loading of referenced datasets.

julia> using JLD2

julia> mutable struct RecursiveStruct
           x::Float64
           y::RecursiveStruct
           RecursiveStruct(x)=new(x)
           RecursiveStruct(x,y)=new(x,y)
       end

julia> jldsave("recursive.jld2"; r=^C

julia> r = RecursiveStruct(1)
RecursiveStruct(1.0, #undef)

julia> r2 = RecursiveStruct(2, r)
RecursiveStruct(2.0, RecursiveStruct(1.0, #undef))

julia> r.y = r2
RecursiveStruct(2.0, RecursiveStruct(1.0, RecursiveStruct(#= circular reference @-2 =#)))

julia> jldsave("recursive.jld2"; r)

julia> load("recursive.jld2")
Dict{String, Any} with 1 entry:
  "r" => RecursiveStruct(1.0, RecursiveStruct(2.0, RecursiveStruct(#= circular … =#)))

julia> struct ObjIDPreservation
           arr1::Vector{Int}
           arr2::Vector{Int}
       end

julia> arr = [1,2,3,4,5,6]
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6

julia> obj = ObjIDPreservation(arr, arr)
ObjIDPreservation([1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6])

julia> obj.arr1 === obj.arr2 # references the same memory
true

julia> jldsave("objidpreservation.jld2"; obj)

julia> data = load("objidpreservation.jld2", "obj")
ObjIDPreservation([1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6])

julia> data.arr1 === data.arr2
true

objidpreservation.zip
recursive.zip

@hz-xiaxz
Copy link
Author

Sorry I don't really know how to use the main branch version of h5web, but I think tests above is good for most jld2 file case I will use

@axelboc
Copy link
Contributor

axelboc commented Sep 16, 2024

#47 improves support for committed datatypes and #48 allows JLD2 files to open directly in H5Web. This should be sufficient to bring basic support for most JLD2 files and resolve this issue.

I've started a discussion thread in the H5Web repo with improvement ideas mentioned in this issue so as to not lose track of them. Feel free to continue the discussion and/or create proper feature requests for these ideas over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants