Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative paths don't work when _protocol_dispatch = False. #211

Open
keunhong opened this issue Mar 26, 2024 · 4 comments
Open

Relative paths don't work when _protocol_dispatch = False. #211

keunhong opened this issue Mar 26, 2024 · 4 comments
Labels
documentation 📘 Improvements or additions to documentation question ❓ Further information is requested

Comments

@keunhong
Copy link

The base pathlib.Path and upath.PosixUPath correctly handle relative paths, but when subclassed relative paths stop working. A prefix / is always prepended:

class MyPath(upath.UPath):
  _protocol_dispatch = False


MyPath("test")
>>> MyPath('/test')
upath.UPath("test")
>>> PosixUPath('test')
@ap--
Copy link
Collaborator

ap-- commented Mar 26, 2024

Hello @keunhong,

When protocol dispatch is turned off, the default implementation of UPath is used for all the protocols (~= filesystems) provided, and it's left to the implementation to handle any special cases. However, this might not always yield the behavior you're looking for.

Could you kindly elaborate on your specific use case? Perhaps together we can explore alternative solutions that are easier to manage.

Best regards,
Andreas 😃

@ap-- ap-- added documentation 📘 Improvements or additions to documentation question ❓ Further information is requested labels Mar 26, 2024
@keunhong
Copy link
Author

I wanted to subclass UPath to override some of the member functions, for example to handle caching like here:

class MyUPath(upath.UPath):
    """A thin wrapper around UPath."""

    # Disable protocol dispatching (replacing class with the subclasses).
    _protocol_dispatch = False

    def open(self, mode: str = "r", *args: Any, **fsspec_kwargs: Any) -> IO[Any]:  # pylint: disable=keyword-arg-before-vararg
        """Open this path. See :py:meth:`UPath.open`."""
        # First fetch the storage options from the class but allow them to be overridden
        # by the keyword arguments.
        storage_options = {**self.storage_options, **fsspec_kwargs}
        cache_storage = storage_options.get("cache_storage")
        cache_timeout = storage_options.get("cache_timeout")

        # If caching is configured and this is a file read, use a cached read.
        if "r" in mode and cache_storage is not None:
            fs = fsspec.filesystem(
                "filecache",
                fs=self.fs,
                cache_storage=str(cache_storage),
                cache_timeout=cache_timeout,
            )
            return fs.open(self, mode, *args, **storage_options)

        return super().open(mode, *args, **fsspec_kwargs)  # pylint: disable=unspecified-encoding

Is there a way to achieve this without turning off protocol dispatch?

@ap--
Copy link
Collaborator

ap-- commented Mar 27, 2024

A few more questions:

  • Do you expect MyUPath to only work for a single protocol, or for all protocols supported by UPath?
  • Is local caching the only reason for wanting to override open, or are there other use cases that require method overrides?
  • Is there a specific reason you are special casing read access to use a WholeFileCacheFileSystem, instead of just using a SimpleCacheFileSystem?

Is there a way to achieve this without turning off protocol dispatch?

In the "single protocol" case, let's say for s3, you can just derive your MyUPath class from S3Path, override open, and directly use MyUPath("s3://bucket/somewhere").
In case you want UPath("s3://bucket/somewhere") to return a MyUPath instance, you can register your implementation instead of the default via:

from upath.implementations.cloud import S3Path
from upath.registry import register_implementation

class MyS3Path(S3Path):
    def open(...):
        ...

register_implementation("s3", MyS3Path, clobber=True)

assert isinstance(UPath('s3://bucket/file'), MyS3Path)

The "all (or many) protocols" case is currently not well supported due to the class inheritance structure replicated from stdlib pathlib. You would have to override the open method for every UPath subclass (or every protocol you intend to use). Note that once urlpath chaining #28 is implemented, fsspec caches will be natively supported in universal-pathlib.


My recommended workaround for now would be to use a context manager that provides files from the cache. When #28 is implemented, you can then switch to using UPath("filecache::proto://some/path", filecache={...}, proto={...})

The context manager could look like this:

from contextlib import contextmanager

@contextmanager
def open_cached(pth: UPath, mode, *args, **fsspec_kwargs):

    storage_options = {**pth.storage_options, **fsspec_kwargs}
    cache_storage = storage_options.get("cache_storage")
    cache_timeout = storage_options.get("cache_timeout")

    # If caching is configured and this is a file read, use a cached read.
    if "r" in mode and cache_storage is not None:
        fs = fsspec.filesystem(
            "filecache",
            fs=pth.fs,
            cache_storage=str(cache_storage)
            cache_timeout=cache_timeout,
            )
        yield fs.open(pth.path, mode, *args, **storage_options)
    else:
        yield pth.open(mode, *args, **fsspec_kwargs)

@keunhong
Copy link
Author

Ah I see the issue now. I wanted the second behavior you mentioned which is having MyUPath support all protocols supported by UPath. The context manager solution should work for now. Excited to see the urlpath chaining make it in.

Thank you for the help, and for the excellent library!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation 📘 Improvements or additions to documentation question ❓ Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants