Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax typing of _key on _BaseVersion #669

Merged
merged 4 commits into from
Feb 6, 2023
Merged

Conversation

di
Copy link
Member

@di di commented Jan 21, 2023

TL;DR:
Changing this typing allows other libraries to extend _BaseVersion and introduce new subclasses that are comparable with Version, and allows static typing to remain correct for both packaging and these other libraries.

Long story:
PyPI is currently unable to upgrade to the latest version of packaging because we need to continue using LegacyVersions -- we still host releases with "legacy" versions, still need to sort "legacy" versions along with "non-legacy" versions, etc. and will likely continue doing so indefinitely.

Instead of remaining stuck on an old version of packaging forever, I'm breaking out the removed LegacyVersion class into a new package, packaging_legacy that provides a drop-in replacement with the removed functionality, deferring as much as possible to the upstream packaging library, and evolving in sync with new releases here.

(Right now, this only resurrects packaging.version.LegacyVersion and packaging.version.parse) because that's all we need.)

Overall, this works great, however type checking fails because _BaseVersion._key is typed too strictly. The old annotation was Union[CmpKey, LegacyCmpKey], and the current annotation CmpKey doesn't allow me to define a LegacyVersion class that inherits from _BaseVersion and provides a _key with the type LegacyCmpKey.

To resolve this, this PR relaxes the typing of _BaseVersion._key to Tuple[Any, ...] and then moves the stricter CmpKey type to Version._key. This should have zero effect on this library and make no difference to anyone downstream.

PS: I'll acknowledge the overall idea is somewhat gross, but I think it's the best way to support things that need to continue supporting legacy versions (as was raised in #530 and #631) without introducing a complete fork of this library or vendoring LegacyVersion and everything related to it into things that need to use it. If you have better ideas, I'd love to hear them.

@pradyunsg
Copy link
Member

Would PyPI's model be easier if packaging just kept the LegacyVersion class and didn't use it; such that PyPI had to keep its own parse_version that does what the old one did?

@pradyunsg
Copy link
Member

I'm asking since it'll be cleaner to have the implementation-used-by-pypi be available in this package rather than somewhere else.

What we want is to push all packages used with build/run tooling, regardless of whether they're on PyPI or not, to use PEP 440 style versions.

@dstufft
Copy link
Member

dstufft commented Jan 24, 2023

I think it's fine for legacy version numbers to require extra opt in somehow... maybe that's the packaging_legacy library, maybe there can be a packaging.legacy package, I dunno.

One thing I'm curious about is whether Warehouse actually needs this. I haven't looking really closely, but for instance you should be able to implement version ordering by doing something like:

from packaging.version import Version

def _version_key(version):
    try:
        return (1, Version(version))
    except Exception:  # I forget the real exception offhand
        return (0, version)

versions = ["1.0-dog", "1.0"]
versions.sort(key=_version_key)

I don't recall if we actually pull information out of version objects or not, or if it's primarily just sorting.

@di
Copy link
Member Author

di commented Jan 24, 2023

Would PyPI's model be easier if packaging just kept the LegacyVersion class and didn't use it; such that PyPI had to keep its own parse_version that does what the old one did?

It doesn't really make any difference to me. I think it's probably easier for everyone to just maintain this separately.

One thing I'm curious about is whether Warehouse actually needs this. I haven't looking really closely, but for instance you should be able to implement version ordering by doing something like:

It's not just sorting, we do attempt to parse the legacy versions in a few places as well, though we could maybe avoid that for legacy versions. I can explore it. Here's the diff I have now:

https://github.com/pypi/warehouse/compare/main...di:warehouse:fix-packaging-upgrade?expand=1

@di
Copy link
Member Author

di commented Jan 24, 2023

(One example is on the project detail page, when we compare a given release's version to the latest version: https://github.com/pypi/warehouse/blob/aa56a2a50d9578ffb1886ce48d0773ad5853e036/warehouse/templates/packaging/detail.html#L176-L183)

@dstufft
Copy link
Member

dstufft commented Jan 24, 2023

To be clear, I don't have a strong preference. I was mostly just an idle thought if that would end up being an easy solution or not.

@abravalheri
Copy link
Contributor

abravalheri commented Jan 24, 2023

Is LegacyVersion something necessary to parse and/or validate METADATA files with old Metadata-Versions?

If that is the case and packaging Metadata API has the ambition to parse and/or validate these kind of files, it might be a good idea to have it directly in packaging...

@di
Copy link
Member Author

di commented Jan 25, 2023

I was mostly just an idle thought if that would end up being an easy solution or not.

After poking it a bit, I don't think it's going to be easy given that we still need to parse/compare legacy versions. I think we'd just end up re-implementing packaging_legacy inside PyPI's codebase. This has the advantage that other people can continue to use legacy functionality as well.

Is LegacyVersion something necessary to parse and/or validate METADATA files with old Metadata-Versions?

I don't think so. Given that we've deprecated/dropped legacy versions here, I would expect any metadata with a legacy version to be considered invalid by this library.

If that is the case and packaging Metadata API has the ambition to parse and/or validate these kind of files, it might be a good idea to have it directly in packaging...

Given how long they've been deprecated here and how long they've been invalid on PyPI, it's probably fine for downstream tools to just consider them invalid too. I think anyone attempting to do something with metadata with legacy versions is doing something sufficiently weird/niche (like, being PyPI) that mainstream/modern tooling doesn't need to accommodate it.

@brettcannon
Copy link
Member

Is LegacyVersion something necessary to parse and/or validate METADATA files with old Metadata-Versions?

I don't think so. Given that we've deprecated/dropped legacy versions here, I would expect any metadata with a legacy version to be considered invalid by this library.

Specifically, the direction packaging.metadata is going is a low-level "raw" API and a high-level strict, enriched API. So for those who need compatibility to the point of legacy version support, people can use the raw metadata and do what they want with raw["version"] which just returns a string. For everyone else, data.version will give you Version and raise an exception for anything invalid.

@pombredanne
Copy link

FWIW, I need to parse eventually any Python version in the various tools I maintain for dependency resolution and vulnerable version range processing.
This may not be what you call mainstream but that's used to collect inventories of Python packages as used as seen in the wild for security and license concerns in quite a few places and large orgs. In the wild means that weird and old dusty corners of the Python world are explored daily with legacy version being unfortunately common enough.

@pradyunsg kindly warned long before dropping LegacyVersion but this was still a source of intense churn in December when packaging 23 was released. I then created a temp fork of packaging called https://github.com/nexB/packvers/ (which is now used in several tools as a stop gap) and experimented with vendoring just the legacy version code in https://github.com/nexB/pip-requirements-parser/blob/main/src/packaging_legacy_version.py . pip-requirements-parser is used in pip-audit and a few other places and is reported as being "critical" by PyPI.

.... BUT none of these solutions (packvers or vendoring) is really satisfying nor working correctly wrt. subclassing and interop with packaging. I'd like to help in anyway I can and it would be awesome to get something clean I can migrate to!

@di
Copy link
Member Author

di commented Jan 26, 2023

@pombredanne Thanks for chiming in. Seems like this and https://github.com/di/packaging_legacy should suit your needs.

@pypa/packaging-committers IMO, we should move forward with this given the discussion here.

@di di requested a review from a team January 26, 2023 15:51
@brettcannon
Copy link
Member

I'm good with accepting this if @pradyunsg is.

@di
Copy link
Member Author

di commented Feb 1, 2023

@pradyunsg Friendly bump here. I'd like to get PyPI onto the latest version of packaging sooner than later.

@pradyunsg pradyunsg merged commit 28c1a05 into pypa:main Feb 6, 2023
@di di deleted the extendable-baseversion branch February 6, 2023 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants