Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an API for working with wheel files #805

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

agronholm
Copy link

Closes #697.

@agronholm
Copy link
Author

Note to self: remember to take into account older versions of specs (pypa/wheel#440 (comment)) when normalizing/validating wheel names.

tags: frozenset[Tag]

@classmethod
def from_filename(cls, fname: str) -> WheelMetadata:
Copy link
Contributor

@henryiii henryiii Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the other direction, too? A to_filename method? Or even just a __str__ that would produce the filename? I had a method like this:

@property
def name_ver(self) -> str:
    name = self.metadata.canonical_name.replace("-", "_")
    # replace - with _ as a local version separator
    version = str(self.metadata.version).replace("-", "_")
    return f"{name}-{version}"

@property
def basename(self) -> str:
    pyver = ".".join(sorted({t.interpreter for t in self.tags}))
    abi = ".".join(sorted({t.abi for t in self.tags}))
    arch = ".".join(sorted({t.platform for t in self.tags}))
    optbuildver = (
        [self.wheel_metadata.build_tag] if self.wheel_metadata.build_tag else []
    )
    return "-".join([self.name_ver, *optbuildver, pyver, abi, arch])

Edit: ahh, I see the new helper function. But it seems like wrapping it in a method here would be natural. But the functionality is there in make_wheel_filename.

Why isn't this BuildTag | None, by the way? I guess it supports an empty tuple. Though then it seems make_wheel_filename could default to an empty tuple instead of supporting None.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this BuildTag | None, by the way? I guess it supports an empty tuple. Though then it seems make_wheel_filename could default to an empty tuple instead of supporting None.

Can you clarify what you're talking about here?

@henryiii
Copy link
Contributor

How would one support prepare_metadata_for_build_wheel using this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?

msg.add_header(key, value)

if "Metadata-Version" not in msg:
msg["Metadata-Version"] = "2.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why default to 2.1? 2.3 is the current version, and in general, a default can't be safely updated, so IMO this should be required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware; I just copied this from the old code.

@agronholm
Copy link
Author

How would one support prepare_metadata_for_build_wheel using this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?

Separating the metadata preparation step should cover that use case. No need to jump through unnecessary hoops then.

@agronholm
Copy link
Author

How would one support prepare_metadata_for_build_wheel using this? I had functionality to only write out the metadata dir. I guess you could write an empty wheel then copy out the .dist-info directory; is that the only way?

Separating the metadata preparation step should cover that use case. No need to jump through unnecessary hoops then.

What changes would that actually entail? From what I can see, only write_metadata() would need to be separated into a free function that would take the package base name and the metadata version as its arguments, yes? Anything else?

@MrMino
Copy link
Contributor

MrMino commented Aug 5, 2024

Hello there 👋 I'm the person who had the bright idea to build this jankyness back in the day, and now has to maintain it for the sake of all internal APIs we've built with it in the company I work for :).

It would be great to have one method for writing both files and directories. This way, if there's a list of paths to include (e.g. from a configuration file), the code doesn't have to perform the if os.path.isdir(path): dance. It also adds nice symmetry for people who would use this in conjunction with tarfile.TarFile.add to create a PEP-517 builder.

Out of curiosity: what's the reason behind the split between WheelReader and WheelWriter? I.e. why not just have a single class with different modes of operation? I realize that doing it this way makes the code far simpler, but is there any other reason, perhaps related to IO considerations?

@agronholm
Copy link
Author

It would be great to have one method for writing both files and directories. This way, if there's a list of paths to include (e.g. from a configuration file), the code doesn't have to perform the if os.path.isdir(path): dance.

Can you elaborate on this? Only write_files_from_directory() uses Path.is_dir(), but I don't see how that's a problem.

It also adds nice symmetry for people who would use this in conjunction with tarfile.TarFile.add to create a PEP-517 builder.

I frankly don't understand what tar files have to do with wheels.

@MrMino
Copy link
Contributor

MrMino commented Aug 7, 2024

Can you elaborate on this? Only write_files_from_directory() uses Path.is_dir(), but I don't see how that's a problem.

It is the API user that will have to Path.is_dir() each path, and that's my point.

I would like to do a WheelWriter.write(path) without having to worry whether its a directory or a file. Right now I have to perform a check first, only then I'll know whether I can use write_files_from_directory or just write_file. Or am I missing something here?

Since you are already performing that check in write_files_from_directory, why not make a write method that can take in both files and directories?

I frankly don't understand what tar files have to do with wheels.

PEP-517 defines two hooks: one for bdists, and one for sdists. I can TarFile.add a path to an sdist, without having to check whether it's a dir or not, yet I cannot WheelWriter.write it to a wheel. It's a minor thing, but if someone wants a builder for both sdists and bdists, it makes the code cleaner when whatever paths are chosen to be added can be treated similarly by both hooks.

@agronholm
Copy link
Author

I would like to do a WheelWriter.write(path) without having to worry whether its a directory or a file

Can you elaborate on your use case? Why can you not use write_files_from_directory()?

@MrMino
Copy link
Contributor

MrMino commented Aug 12, 2024

Why can you not use write_files_from_directory()?

Because it will throw a WheelError if the user input results in a path to a file being given to it, e.g. when adding files from a list of paths. I need to check each path and switch to write_file if that's the case, which is a pattern that the API should cover for me. I'm questioning the ergonomics of having to choose between those two methods at runtime. The API should be liberal in what it accepts without me having to add the is_dir() boilerplate.

I'm unsure what there is to elaborate further. I feel like I'm wasting everyone's time with a relatively minor point.

The code is already making the necessary check in write_files_from_directory, why not just write_file the path there instead of raising the WheelError(f"{basedir} is not a directory")? All that is left afterwards is renaming it to something more generic, e.g. write.

@agronholm
Copy link
Author

I'm unsure what there is to elaborate further. I feel like I'm wasting everyone's time with a relatively minor point.

I've been asking about your use case and you haven't been answering my question, so I will keep asking until you do.

What reason do you have to use the API to write individual files/directories to the wheel? That's what I've been wanting to find out. Are the contents coming from outside the file system?

@MrMino
Copy link
Contributor

MrMino commented Aug 12, 2024

:) I thought this was good enough: writing to a wheel based on a list of paths. We are talking about generic mechanic, so I gave a generic use case.

This should come down to API design principles, not to whether I'm capable of coming up with examples on the level of specificity you expect, but let me try:

A dead-simple build backend that creates wheels by packaging only the paths given by the user, as they are specified inside a list in one of pyproject.toml tables: ["file.py", "some_src_dir", "some_other_dir/data.bin"].

Speaking personally, I don't want to have to care whether each path is a directory/symlink/file. That's all there is to this. Really. I don't need 2 (or, god forbid, 3) methods for this, and I don't want to have to choose between them based on my own path checks - I'd like the API to take care of the checking and choosing what's right for each path.

What reason do you have to use the API to write individual files/directories to the wheel? That's what I've been wanting to find out. Are the contents coming from outside the file system?

It's akin to asking "what reason do you have to use the TarFile to write individual files/directories to a tarball". Lots! 😉. Maybe we are talking past each other - I still want the directories to be walked through and recursively added to the wheel. I just don't want to have to check and choose different method for each path. It's just one if path.is_dir(): ... else ... clause less for the users of this API. 😄

There's no esoteric use case here. It's just a trivial point about the ergonomics of this API. I don't want to bikeshed this any further.

@agronholm
Copy link
Author

My point is that if you're unable to come up with any practical use case for making this change, then why should I do it? It's hardly economical for write() to check if the argument is a file or directory, given the presence of write_files_from_directory(), as ithe former involves an extra system call. For more esoteric use cases, it can also accept a bytestring or unicode string, or even an open file object.

@MrMino
Copy link
Contributor

MrMino commented Aug 12, 2024

My point is that if you're unable to come up with any practical use case for making this change, then why should I do it?

I gave you a practical use case. I already use this to help QA folks package their Robot Framework sources into wheels. They do not necessarily have a good understanding of how Python build systems work, so having an explicit list of paths helps a lot.

It's hardly economical for write() to check if the argument is a file or directory, given the presence of write_files_from_directory(), as ithe former involves an extra system call.

If the cost of adding one syscall to write_files_from_directory() is too high, it would be far more productive for me and you if you just led with that. I was honestly second guessing whether we are talking about the same thing or whether I'm understanding the objectives of this API.

I don't think it's a good argument. write_files_from_directory() already does that syscall on L520, it just doesn't do anything useful with that check apart from bouncing the API user if they forget to check if the path is actually a directory.

Further, current semantics mean that the code using this API has to perform this check twice: once to figure out which method to use, then, again inside write_files_from_directory(). So it's actually less economical this way.

Why not just make that method do both and have one less sharp edge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a wheel management API
4 participants