Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalFileSystem does not work with CachingFileSystem #1618

Open
Alessi42 opened this issue Jun 3, 2024 · 3 comments
Open

LocalFileSystem does not work with CachingFileSystem #1618

Alessi42 opened this issue Jun 3, 2024 · 3 comments

Comments

@Alessi42
Copy link

Alessi42 commented Jun 3, 2024

I am attempting to use the block wise cache to read from networked drive however I am finding that the cache does not store any blocks.

I believe this is due to the fact that LocalFileOpener does not inherit from AbstractBufferedFile but io.IOBase

I have experimented by modifying LocalFileOpener to use AbstractBufferedFile instead and and it appears to work as expected.

Any help would be greatly apprecitated.

Modified LocalFileOpener:

class LocalFileOpener(AbstractBufferedFile):
    def __init__(
        self, path, mode, autocommit=True, fs=None, compression=None, **kwargs
    ):
        super().__init__(fs, path, mode, **kwargs)

        logger.debug("open file: %s", path)
        self.f = None
        self.compression = get_compression(path, compression)
        # self._open()

    def _open(self):
        if self.f is None or self.f.closed:
            if self.autocommit or "w" not in self.mode:
                self.f = open(self.path, mode=self.mode)
                if self.compression:
                    compress = compr[self.compression]
                    self.f = compress(self.f, mode=self.mode)
            else:
                # TODO: check if path is writable?
                i, name = tempfile.mkstemp()
                os.close(i)  # we want normal open and normal buffered file
                self.temp = name
                self.f = open(name, mode=self.mode)
            if "w" not in self.mode:
                self.size = self.f.seek(0, 2)
                self.f.seek(0)
                self.f.size = self.size

    def _fetch_range(self, start, end):
        # probably only used by cached FS
        if "r" not in self.mode:
            raise ValueError
        self._open()
        self.f.seek(start)
        return self.f.read(end - start)
@martindurant
Copy link
Member

Your understanding is correct. Your code looks reasonable, but I worry whether calling seek() for every single read has a performance cost. Additionally, using AbstractBufferedFile would invoke the builtin bytes caching mechanism by default, which very likely is a bad idea when we expect the OS to also be doing readahead smart(er) caching.

@Alessi42
Copy link
Author

Alessi42 commented Jun 4, 2024

Thank you for your response. I also have concerns about the continuous seek. The _fetch_range implementation remains unchanged from the existing LocalFileOpener. Anecdotally, I haven’t noticed a significant performance difference between the two.

I’ve also only focused here on the read implementation here and haven’t explored what it would take to make this work for write operations as well.

@martindurant
Copy link
Member

It is plausible that write already works, worth a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants