Append a Dataset of References #1135

mavaylon1 · 2024-06-27T14:11:38Z

Motivation

What was the reasoning behind this change? Please explain the changes briefly.

How to test the behavior?

Show how to reproduce the new behavior (can be a bug fix or a new feature)

Checklist

Did you update CHANGELOG.md with your changes?
Does the PR clearly describe the problem and the solution?
Have you reviewed our Contributing Guide?
Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

for more information, see https://pre-commit.ci

codecov · 2024-07-02T21:26:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.91%. Comparing base (acc3d78) to head (d5ad0e4).
Report is 1 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1135      +/-   ##
==========================================
+ Coverage   88.89%   88.91%   +0.01%     
==========================================
  Files          45       45              
  Lines        9844     9857      +13     
  Branches     2799     2802       +3     
==========================================
+ Hits         8751     8764      +13     
  Misses        776      776              
  Partials      317      317

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/source/install_users.rst

docs/source/install_developers.rst

Co-authored-by: Ryan Ly <[email protected]>

tests/unit/test_io_hdf5_h5tools.py

rly · 2024-07-13T22:51:15Z

Minor suggestion to a test. Looks good otherwise.

rly · 2024-07-25T00:14:20Z

I added a test that raises an unexpected error:

self = <Closed HDF5 file>, name = <HDF5 object reference (null)>

    @with_phil
    def __getitem__(self, name):
        """ Open an object in the file """
    
        if isinstance(name, h5r.Reference):
            oid = h5r.dereference(name, self.id)
            if oid is None:
>               raise ValueError("Invalid HDF5 object reference")
E               ValueError: Invalid HDF5 object reference

We just chatted in person, but just to note that you were going to take a look at it.

mavaylon1 · 2024-07-28T15:12:16Z

@rly There may be a work around, but I think the problem below might be enough to just start the proxy idea.

def append(self, arg):
        child = arg
        while True:
            if child.parent is not None:
                parent = child.parent
                child = parent
            else:
                parent = child
                break
        self.io.manager.build(parent)
        builder = self.io.manager.build(arg)

        # Create Reference
        ref = self.io._create_ref(builder)
        append_data(self.dataset, ref)

When a user calls append on a reference, we build the root builder first. We then call _create_ref, which will try to create a reference return self.__file[path].ref. This fails with KeyError: "Unable to open object "new". This fails because it is trying to create a reference to an object, i.e., the new baz, within the file; however, it is not in the file. It is in the root builder.

Why isn't in the file? I am in append mode right? Let's ignore the reference for now, and just add the new baz. It works (sort of). When you read it back, the new baz is not there. We need to call write again. Once you do that it is there. That means when we try to create a reference, and it is looking for the new baz in the file only to find nothing because it is not added till write (which we never call during append).

Earlier I said in conversation that you do not need to call write. That is half true. in my method prior (seen below), you do not need to call write to append to a dataset of references, but you do need to call write to add a new baz because it itself is a new group.

Earlier I had

def append(self, arg):
        # Get Builder
        builder = self.io.manager.build(arg)

        # Get HDF5 Reference
        ref = self.io._create_ref(builder)
        append_data(self.dataset, ref)

This leads to a reference being created, but not found with the test self.assertIs(read_bucket1.baz_data.data[10], read_bucket1.bazs["new"]). This is because the reference path is just \. This is wrong. It needs to be '/bazs/new'.

Note: yes this is the same code from hdmf-zarr. I started to wonder if this could just be in hdmf because the append calls _create_ref which means all we need to do is have unique create_ref methods per backend. AKA we wouldn't need a zarr PR that has this logic, just some name changes probably.

rly · 2024-07-29T14:55:14Z

I see. Tricky indeed. You can't create an HDF5 reference to an object that isn't in the file yet, and rebuilding the whole hierarchy on each append is not ideal. A proxy makes sense. I can't think of another workaround without severely limiting and documenting the ways in which you cannot append.

Note: yes this is the same code from hdmf-zarr. I started to wonder if this could just be in hdmf because the append calls _create_ref which means all we need to do is have unique create_ref methods per backend. AKA we wouldn't need a zarr PR that has this logic, just some name changes probably.

That sounds useful to look into. You may be able to refactor it and some fields into the base HDMFDataset class.

mavaylon1 · 2024-08-21T21:23:30Z

Add documentation here: NeurodataWithoutBorders/pynwb#1951

src/hdmf/backends/hdf5/h5_utils.py

src/hdmf/query.py

Co-authored-by: Ryan Ly <[email protected]>

mavaylon1 and others added 6 commits June 27, 2024 07:08

checkpoint

891fd95

[pre-commit.ci] auto fixes from pre-commit.com hooks

3b81034

for more information, see https://pre-commit.ci

checkpoint

285dd75

Merge branch 'dev' into zarr_append

a9a6cc0

checkpoint

e15359c

checkpoint

e2b9057

mavaylon1 mentioned this pull request Jul 3, 2024

Append References hdmf-dev/hdmf-zarr#203

Merged

6 tasks

mavaylon1 added 4 commits July 7, 2024 21:13

check point

a335322

check point

88ac84b

Merge branch 'dev' into zarr_append

6cf4276

clean up

f5343e7

mavaylon1 changed the title ~~Zarr Append a Dataset of References~~ Append a Dataset of References Jul 8, 2024

mavaylon1 added 12 commits July 8, 2024 06:58

Update CHANGELOG.md

e71f577

test work in progress

c4623ad

work in progress

d816311

checkpoint

fc464cc

remove breakpoint

c5d69d1

coverage

638c38c

coverage

ecf692f

Merge branch 'dev' into zarr_append

91ecd31

clean up

fde2bfb

external link

19b515b

Update install_users.rst

3a2b716

Update objectmapper.py

f505321

mavaylon1 marked this pull request as ready for review July 13, 2024 18:45

mavaylon1 requested a review from rly July 13, 2024 18:45

rly reviewed Jul 13, 2024

View reviewed changes

docs/source/install_users.rst Outdated Show resolved Hide resolved

rly reviewed Jul 13, 2024

View reviewed changes

docs/source/install_developers.rst Outdated Show resolved Hide resolved

Update install_users.rst

29a9c78

Co-authored-by: Ryan Ly <[email protected]>

Update install_developers.rst

40e051e

Co-authored-by: Ryan Ly <[email protected]>

rly reviewed Jul 13, 2024

View reviewed changes

tests/unit/test_io_hdf5_h5tools.py Outdated Show resolved Hide resolved

mavaylon1 added 2 commits July 22, 2024 11:47

clean

d3f7a21

clean

90c4296

mavaylon1 requested a review from rly July 23, 2024 01:26

Add reference check

27bb840

Merge branch 'dev' into zarr_append

dcea8a0

mavaylon1 mentioned this pull request Jul 31, 2024

Minimum Support for Appending References in Hdmf-Zarr #1157

Merged

4 tasks

mavaylon1 and others added 7 commits August 18, 2024 17:20

Merge branch 'dev' into zarr_append

a1ad5b1

poc

8f9777e

clean up

07d879f

Merge branch 'dev' into zarr_append

f7c5bc4

Update CHANGELOG.md

af8a26f

Update manager.py

d43401e

labels

c942a7d

mavaylon1 requested a review from stephprince August 21, 2024 21:24

rly reviewed Aug 21, 2024

View reviewed changes

src/hdmf/backends/hdf5/h5_utils.py Show resolved Hide resolved

rly reviewed Aug 21, 2024

View reviewed changes

src/hdmf/query.py Outdated Show resolved Hide resolved

mavaylon1 and others added 4 commits August 21, 2024 17:28

Update query.py

568d8c1

Co-authored-by: Ryan Ly <[email protected]>

Update src/hdmf/backends/hdf5/h5_utils.py

5f89070

Co-authored-by: Ryan Ly <[email protected]>

Merge branch 'dev' into zarr_append

f005e32

tst cov

d5ad0e4

rly approved these changes Aug 22, 2024

View reviewed changes

mavaylon1 merged commit e0bedca into dev Aug 22, 2024
29 checks passed

mavaylon1 deleted the zarr_append branch August 22, 2024 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Append a Dataset of References #1135

Append a Dataset of References #1135

mavaylon1 commented Jun 27, 2024 •

edited

Loading

codecov bot commented Jul 2, 2024 •

edited

Loading

rly commented Jul 13, 2024

rly commented Jul 25, 2024 •

edited

Loading

mavaylon1 commented Jul 28, 2024

rly commented Jul 29, 2024

mavaylon1 commented Aug 21, 2024

Append a Dataset of References #1135

Append a Dataset of References #1135

Conversation

mavaylon1 commented Jun 27, 2024 • edited Loading

Motivation

How to test the behavior?

Checklist

codecov bot commented Jul 2, 2024 • edited Loading

Codecov Report

rly commented Jul 13, 2024

rly commented Jul 25, 2024 • edited Loading

mavaylon1 commented Jul 28, 2024

rly commented Jul 29, 2024

mavaylon1 commented Aug 21, 2024

mavaylon1 commented Jun 27, 2024 •

edited

Loading

codecov bot commented Jul 2, 2024 •

edited

Loading

rly commented Jul 25, 2024 •

edited

Loading