Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow mkdir(p"s3://path/not/ending/in/slash") #124

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

allow mkdir(p"s3://path/not/ending/in/slash") #124

wants to merge 1 commit into from

Conversation

kolia
Copy link
Contributor

@kolia kolia commented Dec 15, 2020

aligning with Base.mkdir behavior.

aligning with `Base.mkdir` behavior.
@rofinn
Copy link
Member

rofinn commented Dec 15, 2020

This doesn't really make sense with S3 paths though. For example, you can have an object called foo and a prefix with foo/ and they mean different things, unlike on traditional system paths. I get that you're only changing it for mkdir, but I personally think it's clearer to error and push that onto the user as necessary. I think this is one of the places where forcing base assumptions on all potential filepaths is a bad idea.

@mattBrzezinski
Copy link
Member

@rofinn has written the S3Pathing component of this package so I deferred to him.

The title vs. code changes are slightly confusing to me, do you mean you want the ability to do:

mkdir(S3Path("s3://mattbr-test-bucket/sub-dir"))

Without the slash there? If so, I think that's a reasonable change to make since you're calling mkdir and know that it will be creating a directory in a bucket. All we need to do in that case is append a / somehow.

Otherwise there is a distinction on S3 with objects foo and foo/ as Rory mentioned.

@rofinn
Copy link
Member

rofinn commented Dec 15, 2020

All we need to do in that case is append a / somehow.

@mattBrzezinski That's what S3Path(fp.segments, fp.root, fp.drive, true, fp.config) is doing because true is specifying the path as a directory.

If so, I think that's a reasonable change to make since you're calling mkdir and know that it will be creating a directory in a bucket.

My disagreement with this is that if there's a distinction then it should probably be applied everywhere rather than special casing specifc functions, where we're just assuming something is a file or directory. We've been burned by things like this in the past and letting folks be lazy about the destinction sometimes, but not others seems like we're asking for trouble.

@kolia
Copy link
Contributor Author

kolia commented Dec 16, 2020

This came up while passing S3Paths through code that is not aware of it: mkdir(joinpath(some_path, "other_folder")).

Guess it comes down to how closely you want S3 paths to be compatible with non-AWSS3-aware path manipulations, which can probably not be achieved perfectly.

In this case though, the caller is clearly trying to make a dir, not a file/object, seems safe to assume they intended to pass in "other_folder/", not "other_folder".

@rofinn
Copy link
Member

rofinn commented Dec 16, 2020

Guess it comes down to how closely you want S3 paths to be compatible with non-AWSS3-aware path manipulations, which can probably not be achieved perfectly.

Yeah, I think we erred on the side of caution, i.e., strict S3 compatibility, as it's easier to constrain inputs on the user side rather than risk having unexpected behaviour when combining generic functions.

In this case though, the caller is clearly trying to make a dir, not a file/object, seems safe to assume they intended to pass in "other_folder/", not "other_folder".

Yeah, I suspect there's a few other cases where we could make that argument though. I'll make an issue to document all the possible cases where we could choose to infer the isdirectory property based on the function called.

@ericphanson
Copy link
Member

ericphanson commented Jul 15, 2021

Another case where this comes up:

julia> path = S3Path("s3://bucket/prefix/test/");

julia> tmp = mktempdir(path)
ERROR: ArgumentError: S3Path folders must end with '/': s3://bucket/prefix/test/448e28ff-f9c7-4e8d-a20d-17f98c00198a

Though looking at the code of mktempdir, probably it will run into other issues later even if this was fixed.

@rofinn
Copy link
Member

rofinn commented Jul 15, 2021

Hmm, S3Path overloadsmktempdir to include the trailing /, so it seems like a bug that it's getting stripped away.

@rofinn
Copy link
Member

rofinn commented Jul 15, 2021

I wonder if this package should have a flag for deciding how strictly to follow the S3 conventions? Maybe giving a single warning about dragons when you change it? I know I've been burnt by having directories and objects with the same name, but that should probably be a risk users can decide for themselves.

@ericphanson
Copy link
Member

ericphanson commented Sep 7, 2021

Not having this also seems to break cp-- when it recurses it tries to make directories without a / and throws.

Edit: Actually I think this should be fixed in cp (ref rofinn/FilePathsBase.jl#128). If we make this change here, what happens is

dir = S3Path(bucket, key_without_final_slash)
if FilesPathBase.exists(dir)
   ...
else
   mkdir(dir)
end

can throw! Why? Because dir doesn't exist but dir*"/" can exist. So mkdir will throw an error that dir already exists even though we checked that it doesn't already. (This comes up in cp).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants