Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix linking for FSSpec #138

Merged
merged 14 commits into from
Nov 17, 2023
Merged

Conversation

alejoe91
Copy link
Collaborator

@alejoe91 alejoe91 commented Nov 10, 2023

Motivation

The ZarrIO.resolve_ref() function assumes paths are local. This PR adds a method is_remote() to check if the zarr.store is FSStore and handles remote paths and links correctly.

Fixes #134

How to test the behavior?

from hdmf_zarr import NWBZarrIO

remote_zarr_location = "s3://aind-open-data/ecephys_625749_2022-08-03_15-15-06_nwb_2023-05-16_16-34-55/ecephys_625749_2022-08-03_15-15-06_nwb/ecephys_625749_2022-08-03_15-15-06_experiment1_recording1.nwb.zarr/"

with NWBZarrIO(remote_zarr_location, "r") as io:
    nwbfile = io.read()

print(nwbfile)

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@alejoe91
Copy link
Collaborator Author

Should I add a test?

@codecov-commenter
Copy link

codecov-commenter commented Nov 10, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (b51e127) 86.10% compared to head (c1204f5) 86.24%.
Report is 4 commits behind head on dev.

Files Patch % Lines
tests/unit/utils.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #138      +/-   ##
==========================================
+ Coverage   86.10%   86.24%   +0.13%     
==========================================
  Files          13       14       +1     
  Lines        3189     3221      +32     
==========================================
+ Hits         2746     2778      +32     
  Misses        443      443              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alejoe91
Copy link
Collaborator Author

Should I add a test?

Added one using the Allen Institute dataset. Not sure if and where you guys want to add it to the CI, since it's currently quite slow. Maybe it would be better to keep this as a place holder and run the test once we have a small test file stored somewhere?

@oruebel
Copy link
Contributor

oruebel commented Nov 10, 2023

Thanks for taking a stab at this! We can work on getting a small file for testing on DANDI hopefully while we are at SfN. The PR looks overall good to me.

@oruebel
Copy link
Contributor

oruebel commented Nov 12, 2023

@alejoe91 When running the test locally, I get the following error:

FAILED tests/unit/test_fsspec_streaming.py::TestFSSpecStreaming::test_fsspec_streaming - botocore.exceptions.NoCredentialsError: Unable to locate credentials

Is there a way to avoid the need for credentials in the test and/or would using a file on DANDI fix this?

I made a few minor changes to fix flake8 warnings (to avoid unused import warnings) and adding fsspec and s3fs as optional requirements.

With regard to the time it takes for the test, I think this is Ok for now. I.e., I would be in favor of merging the PR and creating a separate issue to for creating a smaller test file to speed up the test. I.e., aside from the credential issues with the test, this PR looks good to me.

@oruebel
Copy link
Contributor

oruebel commented Nov 12, 2023

The credential error also occured in the GitHub action here:

https://github.com/hdmf-dev/hdmf-zarr/actions/runs/6838973516/job/18596637194?pr=138#step:7:119

@alejoe91
Copy link
Collaborator Author

I see! I think I haven't propagated the storage_options parameter to all zarr.open()! I'll push a fix tomorrow

src/hdmf_zarr/backend.py Outdated Show resolved Hide resolved
src/hdmf_zarr/backend.py Outdated Show resolved Hide resolved
src/hdmf_zarr/backend.py Outdated Show resolved Hide resolved
@oruebel
Copy link
Contributor

oruebel commented Nov 14, 2023

Just for reference, to help create a test with DANDI:

Example code how to get a S3 URL for Zarr read from DANDI

from dandi.dandiapi import DandiAPIClient
from dandi.consts import known_instances

staging_instance = known_instances["dandi-staging"]
dandiset_id = "204919"
version_id = "draft"

client = DandiAPIClient(dandi_instance=staging_instance)
dandiset = client.get_dandiset(dandiset_id=dandiset_id, version_id=version_id)
zarr_asset = dandiset.get_asset_by_path(path="test_read_nwbfile/test2.nwb.zarr")
metadata = zarr_asset.get_metadata()
content = metadata.contentUrl[1]
s3_url = f"s3://dandiarchive{content.path}"

oruebel
oruebel previously approved these changes Nov 14, 2023
Copy link
Contributor

@oruebel oruebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alejoe91 !

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@oruebel oruebel merged commit 7555512 into hdmf-dev:dev Nov 17, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: S3 streaming support via fsspec
3 participants