Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsspec URL chaining #28

Open
brl0 opened this issue Aug 4, 2021 · 3 comments
Open

fsspec URL chaining #28

brl0 opened this issue Aug 4, 2021 · 3 comments
Labels
enhancement 🚀 New feature or request
Milestone

Comments

@brl0
Copy link
Contributor

brl0 commented Aug 4, 2021

This is not urgent, but based on my very limited and quick testing, it seems that fsspec URL chaining may not work properly.
It is possible I was not doing it properly, or there maybe some clever way to work around the limitation, but I thought it was worth raising an issue for now.

@ap-- ap-- added the enhancement 🚀 New feature or request label Aug 2, 2023
@ap--
Copy link
Collaborator

ap-- commented Sep 30, 2023

Note: a url-chaining implementation should also support other chaining styles, see: zarr-developers/zeps#48

@mraspaud
Copy link

Hi, I'm interested in getting this functionality. Is there a lot to do for this? If not, I could maybe have a look?

@ap--
Copy link
Collaborator

ap-- commented Jun 1, 2024

Thank you for offering to contribute!

To implement url chaining support two items have to be completed:

  1. We need to first parse the chained url into (protocol, path, and storage options) for each protocol in the chain. All of this functionality is already available in filesystem_spec, and the function that does this in upath should rely as much as possible on the filesystem_spec implementation.
  2. The UPath class should get an attribute (name up for debate) .chain that acts much like UPath.parents but provides access to the individual links/filesystems of the chained url. To make this work correctly, a upath instance would have to keep track of the (protocol, path, storage options) tuples before and after the current filesystem.

To provide an example of how this should look like:

# interface does not exist yet
# this is just a mockup

>>> from upath import UPath
>>> pth = UPath("simplecache::zip://path/in/archive/spreadsheet.csv::s3://mybucket/data.zip")
S3Path("simplecache::zip://path/in/archive/spreadsheet.csv::s3://mybucket/data.zip")
>>> len(pth.chain)
3
>>> pth.chain[-2]
ZipPath("simplecache::zip://path/in/archive/spreadsheet.csv::s3://mybucket/data.zip")
>>> pth2 = pth.chain[-2]
>>> pth2.with_name("other_spreadsheet.csv")
ZipPath("simplecache::zip://path/in/archive/other_spreadsheet.csv::s3://mybucket/data.zip")

When this gets implemented, there are a very likely few complications that would have to be solved. For example, the various caches implemented in filesystem_spec don't take a path, they just use the path of the next filesystem in the chain. So this should probably be special-cased in UPath somehow.

Also when instantiating UPath with a chained url, storage options should now be provided as dicts with the protocol as the target. I.e. UPath("zip://pth1::s3://bucket/pth2", zip={...}, s3={...})

I'll try to provide more information once I am back from traveling!

Cheers,
Andreas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚀 New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants