-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid copying files with same name prefix when copying folder #576
Changes from 2 commits
0be233f
6d46a60
13a6ac6
d1ff25f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -989,18 +989,6 @@ async def _info(self, path, generation=None, **kwargs): | |
else: | ||
raise FileNotFoundError(path) | ||
|
||
async def _glob(self, path, prefix="", **kwargs): | ||
if not prefix: | ||
# Identify pattern prefixes. Ripped from fsspec.spec.AbstractFileSystem.glob and matches | ||
# the glob.has_magic patterns. | ||
indstar = path.find("*") if path.find("*") >= 0 else len(path) | ||
indques = path.find("?") if path.find("?") >= 0 else len(path) | ||
indbrace = path.find("[") if path.find("[") >= 0 else len(path) | ||
|
||
ind = min(indstar, indques, indbrace) | ||
prefix = path[:ind].split("/")[-1] | ||
return await super()._glob(path, prefix=prefix, **kwargs) | ||
|
||
async def _ls( | ||
self, path, detail=False, prefix="", versions=False, refresh=False, **kwargs | ||
): | ||
|
@@ -1410,6 +1398,9 @@ async def _find( | |
else: | ||
_prefix = key | ||
|
||
if _prefix != "" and await self._isdir(f"{bucket}/{_prefix}"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there no way to avoid isdir here? This doubles the number of calls for a simple list - and isdir surely is doing a list op anyway! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right, it's not useful to double the calls there. That's why I initially indicated that a better solution should probably exist ;) I updated the code with this commit to try out another solution inspired by the Let me know if that works for you. This commit is tested against this PR to fix this test. |
||
_prefix = _prefix.rstrip("/") + "/" | ||
|
||
objects, _ = await self._do_list_objects( | ||
bucket, delimiter="", prefix=_prefix, versions=versions | ||
) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -292,10 +292,11 @@ def test_gcs_glob(gcs): | |
for f in gcs.glob(TEST_BUCKET + "/nested/*") | ||
if gcs.isfile(f) | ||
) | ||
# the following is no longer true since the glob method list the root path | ||
# Ensure the glob only fetches prefixed folders | ||
gcs.dircache.clear() | ||
gcs.glob(TEST_BUCKET + "/nested**1") | ||
assert all(d.startswith(TEST_BUCKET + "/nested") for d in gcs.dircache) | ||
# gcs.dircache.clear() | ||
# gcs.glob(TEST_BUCKET + "/nested**1") | ||
# assert all(d.startswith(TEST_BUCKET + "/nested") for d in gcs.dircache) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without the |
||
# the following is no longer true as of #437 | ||
# gcs.glob(TEST_BUCKET + "/test*") | ||
# assert TEST_BUCKET + "/test" in gcs.dircache | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we apply the fix in the
_find
method below then this glob prefix construction no longer work for the edge case highlighted here.This is because the the glob
subdir*
will create asubdir
prefix and will be handled like a folder in the_find
method (and add a trailing slash and miss thesubdir.txt
file).Also, not sure to understand the initial benefit of this
_glob
method override. Was this just for caching?Anyway, if another fix exists maybe this could be left as-is.