Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement glob logic #202

Merged
merged 6 commits into from
Feb 6, 2022
Merged

Re-implement glob logic #202

merged 6 commits into from
Feb 6, 2022

Conversation

pjbull
Copy link
Member

@pjbull pjbull commented Feb 5, 2022

glob, rglob, and iterdir did not work as their pathlib counterparts did.

In order to align our implementation with the pathlib versions, we adapted CloudPaths to work in the same way that the CPython implementation of glob does, which is relatively complex in order to handle all of the glob scenarios. This is based on the _Selector class.

This change introduces two internal classes, _CloudPathSelectable and _CloudPathSelectableAccessor that wrap a CloudPath to expose the methods expected by _Selector and its children.

The change also changes the internal Client._list_dir API to return a tuple of the form (entry, is_dir), which could help address #181 and #176. Internal code where we use iterdir can be replaced with _list_dir if knowing if something is a file or directory is done within the method.

In order to close #15, this also makes sure that iterdir does not include self.

In order to make the tests pass, this also closes #186, which drops support for Python 3.6

Tests for all of these changes are included. The test suite is created to mirror the tests for glob and rglob in the CPython codebase. Note: test_glob_many_open_files, which emulates a CPython tests is pretty slow. We may consider not matching to the same depth and range to speed that up if it becomes frustrating.

Note: This does leverage some internals of CPython to work correctly. It's possible that implementations without the _make_selector, _posix_flavour private apis will not work. If this is a problem, we may need to re-implement or vendor _Selector and its child classes. It seems to at least work fine on pypy not sure about anything else:

cloudpathlib onglob-new-implementation [$?] 🐍 v3.7.12 via 🅒 cloudpathlib-pypy took 2spypy
Python 3.7.12 | packaged by conda-forge | (44db2626, Nov 15 2021, 13:00:09)
[PyPy 7.3.7 with GCC Clang 11.1.0] on darwin

>>>> from cloudpathlib import CloudPath
>>>> root_dir = CloudPath("s3://drivendata-public-assets/")
>>>> list(root_dir.glob('**/*.txt'))
[S3Path('s3://drivendata-public-assets/nested_dir/test.dot/a_file.txt'), S3Path('s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt')]

Closes #154
Closes #15
Closes #186

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2022

@codecov
Copy link

codecov bot commented Feb 5, 2022

Codecov Report

Merging #202 (ed1a075) into master (72f0725) will increase coverage by 0.4%.
The diff coverage is 100.0%.

@@           Coverage Diff            @@
##           master    #202     +/-   ##
========================================
+ Coverage    94.3%   94.7%   +0.4%     
========================================
  Files          21      21             
  Lines        1198    1247     +49     
========================================
+ Hits         1130    1182     +52     
+ Misses         68      65      -3     
Impacted Files Coverage Δ
cloudpathlib/azure/azblobclient.py 94.6% <100.0%> (ø)
cloudpathlib/client.py 87.5% <100.0%> (ø)
cloudpathlib/cloudpath.py 93.1% <100.0%> (+1.2%) ⬆️
cloudpathlib/exceptions.py 100.0% <100.0%> (ø)
cloudpathlib/gs/gsclient.py 94.2% <100.0%> (ø)
cloudpathlib/local/localclient.py 97.4% <100.0%> (ø)
cloudpathlib/s3/s3client.py 96.1% <100.0%> (+0.9%) ⬆️
cloudpathlib/local/implementations/gs.py 100.0% <0.0%> (ø)
cloudpathlib/local/implementations/s3.py 100.0% <0.0%> (ø)
cloudpathlib/local/implementations/azure.py 97.5% <0.0%> (+<0.1%) ⬆️
... and 1 more

cloudpathlib/cloudpath.py Outdated Show resolved Hide resolved
cloudpathlib/cloudpath.py Outdated Show resolved Hide resolved
cloudpathlib/cloudpath.py Outdated Show resolved Hide resolved
cloudpathlib/cloudpath.py Outdated Show resolved Hide resolved
cloudpathlib/cloudpath.py Outdated Show resolved Hide resolved
cloudpathlib/cloudpath.py Outdated Show resolved Hide resolved
requirements-dev.txt Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@pjbull
Copy link
Member Author

pjbull commented Feb 5, 2022

@jayqi review comments implemented, thanks!

Copy link
Member

@jayqi jayqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pretty amazed that we got this to work. Exciting stuff!

cloudpathlib/s3/s3client.py Show resolved Hide resolved
@pjbull pjbull merged commit 2084e20 into master Feb 6, 2022
@pjbull pjbull deleted the glob-new-implementation branch February 6, 2022 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants