Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Accept pathlib.Path objects where filenames are accepted #610

Merged
merged 18 commits into from
Oct 23, 2019

Conversation

CRiddler
Copy link
Contributor

For python >= 3.6, objects representing paths now have the option to have a dunder __fspath__ (See PEP 519 for more info) to return the string (or bytes) representation of the path. Pathlib on 3.6 includes this method so we should check for it first. Then, for pathlib objects from the py 3.4-3.5x, __fspath__ is not available, so we dip into the py3k module to get the current python version's (2 or 3) string method (unicode in this case). So overall we're allowing filename to be a path object, but ensuring that it will be converted to the string/bytes representation of the path before passing it on to image_klass.from_filename

My only reservations is that nib.load lowest entry point for this type of handling, because if some one really wants to call from_filename from an image_klass directly, they won't have this pathlib compatibility- however I think that nib.load is a very common entry point, which justifies the placement.

For python >= 3.6, objects representing paths now have the option to have a dunder `__fspath__` (See PEP 519 for more info) to return the string (or bytes) representation of the path. Pathlib on 3.6 includes this method so we should check for it first. Then, for pathlib objects from the py 3.4-3.5x, `__fspath__` is not available, so we dip into the py3k module to get the current python version's (2 or 3) string method (`unicode` in this case). So overall we're allowing filename to be a path object, but ensuring that it will be converted to the string/bytes representation of the path before passing it on to `image_klass.from_filename`

My only reservations is that `nib.load` lowest entry point for this type of handling, because if some one really wants to call `from_filename` from an image_klass directly, they won't have this pathlib compatibility- however I think that `nib.load` is a very common entry point, which justifies the placement.
@CRiddler CRiddler changed the title load save with pathlib.Path objects ENH: loadsave with pathlib.Path objects Mar 15, 2018
@coveralls
Copy link

coveralls commented Mar 15, 2018

Coverage Status

Coverage decreased (-0.007%) to 91.783% when pulling 9835998 on CRiddler:loadsave-pathlib_compat into 30c0bc5 on nipy:master.

nibabel/loadsave.py Outdated Show resolved Hide resolved
@nibotmi
Copy link
Contributor

nibotmi commented Mar 22, 2018

☔ The latest upstream changes (presumably #611) made this pull request unmergeable. Please resolve the merge conflicts.

@fepegar
Copy link
Contributor

fepegar commented Jun 13, 2018

Why is the PR unmergeable?

@matthew-brett
Copy link
Member

It cannot be merged because there are conflicts with the master branch - see the web interface for this PR for links.

@fepegar
Copy link
Contributor

fepegar commented Jun 13, 2018

@CRiddler, do you think you can fix the conflicts?

@codecov-io
Copy link

codecov-io commented Jun 13, 2018

Codecov Report

Merging #610 into master will increase coverage by 0.2%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #610     +/-   ##
=========================================
+ Coverage    90.1%   90.31%   +0.2%     
=========================================
  Files          96       96             
  Lines       11914    12171    +257     
  Branches     2125     2127      +2     
=========================================
+ Hits        10735    10992    +257     
  Misses        834      834             
  Partials      345      345
Impacted Files Coverage Δ
nibabel/filebasedimages.py 88.38% <ø> (+0.97%) ⬆️
nibabel/freesurfer/mghformat.py 95.55% <100%> (+0.1%) ⬆️
nibabel/loadsave.py 90.74% <100%> (+0.35%) ⬆️
nibabel/filename_parser.py 94.31% <90.9%> (+0.41%) ⬆️
nibabel/gifti/giftiio.py 100% <0%> (ø) ⬆️
nibabel/keywordonly.py 100% <0%> (ø) ⬆️
nibabel/streamlines/tractogram_file.py 100% <0%> (ø) ⬆️
nibabel/arrayproxy.py 100% <0%> (ø) ⬆️
nibabel/imageclasses.py 100% <0%> (ø) ⬆️
nibabel/streamlines/array_sequence.py 100% <0%> (ø) ⬆️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update debf173...60a4624. Read the comment docs.

@CRiddler
Copy link
Contributor Author

The conflicts have been resolved, however as stated earlier- this approach to resolving pathlib objects is not final nor foolproof. Essentially, we are checking if the filename is a Pathlike object which is only available in python 3.6+ and coercing that properly. If its not a Pathlike then we're coercing the object to six.unicode type. What happens when the user passes in the filename as None? Should we be coercing None to a string/unicode type? While the example is a corner case, it is still technically a case of an improperly handled input which can be fixed by a more in-depth filename handling procedure as discussed in the earlier comments.

@effigies effigies modified the milestones: 2.5.0, 3.0.0 Apr 25, 2019
@effigies
Copy link
Member

I'm setting this milestone to 3.0. Once we're only dealing with Python 3.5+, that reduces the range of behaviors we need to accommodate. If you want to try for 2.5 and support Python 2's pathlib, we can do that.

@effigies
Copy link
Member

Hi @CRiddler, @fepegar. By pushing off to Python3-only, I think we can be a lot dumber about this. I would say the logic can simply be:

filename = pathlib.Path(filename)

If we place this at the head of load, save, to_filename and from_filename, then we can just work with Paths internally, converting to str only when we hit functions that don't support them.

Sound like a reasonable approach? We would also want to add some tests where we pass in valid and invalid strings, paths, path-like objects, and non-path-likes.

@fepegar
Copy link
Contributor

fepegar commented May 22, 2019

Sounds reasonable to me.

@effigies effigies modified the milestones: 3.0.0, 3.0.0 RC1 Jul 29, 2019
@pep8speaks
Copy link

pep8speaks commented Aug 19, 2019

Hello @CRiddler, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 nibabel.

Comment last updated at 2019-10-23 15:31:04 UTC

@effigies
Copy link
Member

Hi @CRiddler, we're now requiring Python 3.5.1+, so we can resume work on this with simpler assumptions about the sort of objects we're going to be passed. I merged in master to resolve conflicts and get you going, but let me know how you want to proceed here.

@CRiddler
Copy link
Contributor Author

Hi @effigies, if we do want to convert everything into pathlib.Path objects and deal with those internally then a good amount of filename_parser.py will have to be rewritten. Do we still want to try to move forward with pathlib internally? Or should we add in the method that pandas uses to coerce pathlike objects to python3 string types?

@effigies
Copy link
Member

Hmm. I haven't really scoped out where all we will need to make modifications to use Paths internally. I figured we'd find the entry-points, and follow the logic to see where things break.

But if we want to take the Pandas approach, the entry-points are still the first step, so I don't see these as competing approaches. If anything the Pandas-like one would be a good first pass, and then we could evaluate if we want to take the plunge with internal pathlib.Paths. There are a number of places where we take either filenames or open filehandles, so this could end up getting messy, and might not be worth it.

As a note, it looks like Pandas has simplified their check: https://github.com/pandas-dev/pandas/blob/325dd686de1589c17731cf93b649ed5ccb5a99b4/pandas/io/common.py#L131-L160
Their minimum Python is now 3.5.3, so it seems pretty safe to follow suit. I don't think we need to copy the expand_user bit, though.

@CRiddler
Copy link
Contributor Author

I agree in that coercing everything to strings at the entry points will be the easiest way to go. The updated pandas method looks like it'll work perfectly for us since __fspath__ is supported in 3.6+ and coercing pathlib.Path types to str will support any 3.5 <= 3.6 versions.

I will update the branch with a function similar to pandas approach (not sure how to give them credit for that though) and then add that function in the entry points you listed. We can see where to go from there? I can probably get around to this by the end of the week.

@effigies
Copy link
Member

Cool. We have external licenses in COPYING, IIRC, so that and a link to the original code (with specific revision) should suffice. Our licenses are compatible.

@effigies
Copy link
Member

@CRiddler Just a bump on this one. Let me know if there's anything I can do to help.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. Some suggestions on paths to guard.

And we should also verify that we can pass pathlib.Paths to each method that's guarded. One good place would be here:

fname = 'an_image' + self.standard_extension
img.set_filename(fname)
assert_equal(img.get_filename(), fname)
assert_equal(img.file_map['image'].filename, fname)
# to_ / from_ filename
fname = 'another_image' + self.standard_extension
with InTemporaryDirectory():
img.to_filename(fname)
rt_img = img.__class__.from_filename(fname)
assert_array_equal(img.shape, rt_img.shape)
assert_almost_equal(img.get_fdata(), rt_img.get_fdata())
# get_data will be deprecated
assert_almost_equal(img.get_data(), rt_img.get_data())
del rt_img # to allow windows to delete the directory

I would just make it loop, e.g.:

fname = 'an_image' + self.standard_extension
for path in (fname, Path(fname)):
    ...

And here, as well:

nils.save(img, fname)
rt_img = nils.load(fname)

And also in https://github.com/nipy/nibabel/blob/master/nibabel/tests/test_loadsave.py.

Finally, all functions/methods that accept os.PathLike objects should have their docstrings updated.

nibabel/filebasedimages.py Outdated Show resolved Hide resolved
nibabel/filename_parser.py Outdated Show resolved Hide resolved
nibabel/filename_parser.py Outdated Show resolved Hide resolved
nibabel/filename_parser.py Outdated Show resolved Hide resolved
nibabel/filename_parser.py Outdated Show resolved Hide resolved
nibabel/filename_parser.py Show resolved Hide resolved
@effigies
Copy link
Member

effigies commented Oct 3, 2019

@CRiddler Just a bump on this. Let me know if there's anything I can help with. I'm going to aim to feature freeze 3.0 on October 28 (three weeks from Monday) so we can have a decent RC period.

@CRiddler
Copy link
Contributor Author

CRiddler commented Oct 3, 2019

Yep! This is still on my radar, classes started for me this past week so I've been a little busy. This is the first on my to-do list once I have a bit of free time, so I should have it done over the weekend.

@effigies
Copy link
Member

effigies commented Oct 3, 2019

I mean... also enjoy your weekend.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Some minor fixes and cleanups.

Also from one of the tests:

======================================================================
ERROR: autogenerated test from validate_filenames
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\site-packages\nibabel\tests\test_api_validators.py", line 20, in meth
    validator(self, imaker, params)
  File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\site-packages\nibabel\tests\test_image_api.py", line 146, in validate_filenames
    img.set_filename(path)
  File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\site-packages\nibabel\filebasedimages.py", line 255, in set_filename
    self.file_map = self.__class__.filespec_to_file_map(filename)
  File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\site-packages\nibabel\freesurfer\mghformat.py", line 533, in filespec_to_file_map
    if splitext(filespec)[1].lower() == '.mgz':
  File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\ntpath.py", line 224, in splitext
    return genericpath._splitext(p, '\\', '/', '.')
  File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\genericpath.py", line 118, in _splitext
    sepIndex = p.rfind(sep)
AttributeError: 'WindowsPath' object has no attribute 'rfind'

It looks like MGHImage has its own implementation of filespec_to_file_map that needs to be updated.

nibabel/filename_parser.py Show resolved Hide resolved
nibabel/loadsave.py Show resolved Hide resolved
nibabel/loadsave.py Show resolved Hide resolved
nibabel/tests/test_image_api.py Outdated Show resolved Hide resolved
nibabel/tests/test_image_api.py Outdated Show resolved Hide resolved
nibabel/tests/test_image_load_save.py Outdated Show resolved Hide resolved
@effigies
Copy link
Member

Hmm. Looks like your history got a bit messed up there.

Screen Shot 2019-10-16 at 16 51 58

I'd try rebasing onto fbaeacc. Should be pretty clean, as it's the merge commit that seems to be causing damage.

@CRiddler
Copy link
Contributor Author

Hopefully that fixed, I always struggle when it comes to wrestling with git history. If it needs some more reworking, I may need some hand-holding to resolve the history issue.

@effigies
Copy link
Member

Nope.

Screen Shot 2019-10-16 at 17 37 34

So the easy way to do this is just:

git rebase -i fbaeacc

That will open an editor, probably vim or nano:

pick 6056014b update _stringify_path doc
pick b54926e3 update docstrings to accept str or os.PathLike
pick 8a34a7a0 mghformat accept pathlib for filespec_to_file_map
pick d6c96c02 tests pathlib compatible
pick cf631067 fix _stringify_path doc

That should be fine, so save and exit. Now you'll see:

Auto-merging nibabel/loadsave.py
Auto-merging nibabel/filename_parser.py
CONFLICT (content): Merge conflict in nibabel/filename_parser.py
Auto-merging nibabel/filebasedimages.py
error: could not apply b54926e3... update docstrings to accept str or os.PathLike
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply b54926e3... update docstrings to accept str or os.PathLike

If you run git status, you should see:

interactive rebase in progress; onto fbaeacc9
Last commands done (2 commands done):
   pick 6056014b update _stringify_path doc
   pick b54926e3 update docstrings to accept str or os.PathLike
Next commands to do (3 remaining commands):
   pick 8a34a7a0 mghformat accept pathlib for filespec_to_file_map
   pick d6c96c02 tests pathlib compatible
  (use "git rebase --edit-todo" to view and edit)
You are currently rebasing branch 'loadsave-pathlib_compat' on 'fbaeacc9'.
  (fix conflicts and then run "git rebase --continue")
  (use "git rebase --skip" to skip this patch)
  (use "git rebase --abort" to check out the original branch)

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   nibabel/loadsave.py

Unmerged paths:
  (use "git reset HEAD <file>..." to unstage)
  (use "git add <file>..." to mark resolution)

	both modified:   nibabel/filename_parser.py

So edit nibabel/filename_parser.py as you would during a merge conflict (search for <<<< and resolve), then:

git add nibabel/filename_parser.py
git rebase --continue

You may be dropped into an editor again, giving you a chance to modify the commit message. You can just save and exit, to leave it unchanged. Then you might see:

[detached HEAD 2d37fb71] update docstrings to accept str or os.PathLike
 Author: Cameron Riddell <[email protected]>
 1 file changed, 2 insertions(+), 2 deletions(-)
The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:

    git commit --allow-empty

Otherwise, please use 'git reset'
interactive rebase in progress; onto fbaeacc9
Last commands done (5 commands done):
   pick d6c96c02 tests pathlib compatible
   pick cf631067 fix _stringify_path doc
No commands remaining.
You are currently rebasing branch 'loadsave-pathlib_compat' on 'fbaeacc9'.

nothing to commit, working tree clean
Could not apply cf631067... fix _stringify_path doc

In this case, there's a commit that had no effect. So you can just

git rebase --continue

When you've iterated through all of the necessary changes:

> git status
On branch loadsave-pathlib_compat
Your branch and 'origin/loadsave-pathlib_compat' have diverged,
and have 3 and 6 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

nothing to commit, working tree clean

Since your GitHub branch has a lot of old stuff in it, instead of:

git diff origin/loadsave-pathlib_compat

I would suggest just comparing to the commit you rebased against:

git diff fbaeacc9

If that looks reasonable, go ahead and force-push:

git push -f

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two remaining comments that may have gotten lost in the mix.

nibabel/filename_parser.py Outdated Show resolved Hide resolved
nibabel/filename_parser.py Outdated Show resolved Hide resolved
Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple style issues (see #610 (comment)). And one small fix:

nibabel/filebasedimages.py Outdated Show resolved Hide resolved
@effigies effigies changed the title ENH: loadsave with pathlib.Path objects ENH: Accept pathlib.Path objects where filenames are accepted Oct 23, 2019
@effigies
Copy link
Member

Great! Thanks for this.

@effigies effigies merged commit ae2fa36 into nipy:master Oct 23, 2019
@fepegar
Copy link
Contributor

fepegar commented Oct 23, 2019

Yay! Thank you guys for having worked on this.

@effigies effigies modified the milestones: 3.0.0 RC1, 3.0.0 Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants