-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add support for requirements in the form of PackageURLs (purls) #40
Comments
FWIW many CPAN dist that use MariaDB or MySQL do not really care if either DB is their backend, as long as their tool is supported and the DBD works. |
For details about the spec, see the Package URL specification and the overview of PURL Types. There you'll see that As for having multiple alternative PURLs referring to the same (or equivalent) dependencies, that shouldn't be a problem? Just list all of them? If one resolves a package URL on a Debian system With that said, I guess there may be a need for the option to specify multiple equivalent requirements, so that in the case of having a system where both mysql and mariadb is available, only one of them is actually installed? Unsure of how that should be specified, and unsure if this is in-scope for CPAN::Meta::Requirements... |
I don't think completely changing the meaning of those fields at this stage is a workable solution. There are too many things that have interpreted them as package names for too long.
This is not what PURLs are designed to do, and the problem inherently requires something more fuzzy. It needs to resolve
IMO it falls well outside the scope of C::M::R |
Ah, apologies for not being clear about my intentions. I'm absolutely not proposing to "completely change the meaning" of those fields. I'm proposing an addition. The old ways should of course continue to work as always (anything else would be reckless). 😅
I'm unsure what you mean here. Could you expand with your reasoning behind what you're saying? My understanding of PURLs is that they are designed for the purpose of identifying specific packages within a package ecosystem+namespace. If two ecosystems refers to the same code using different names, then PURLs exists to help make this possible in a standardized and way that is ecosystem-independent. Is your understanding different somehow? (edit: I've added a few examples to the OP) |
And everything that handles this field now having two meanings. That's going to cause a lot of breakage.
There are too many ecosystems, listing packages this explicitly isn't workable. What we actually need to do is map things repology style, probably even by using repology. |
Appreciate you're spending some calories on this issue! 😁 I'd still love to hear your reasoning behind your "This is not what PURLs are designed [...]" statement, though!
Ok, how? Can you come with an example of breakage? I can imagine that some types of feature guards can help downstream tooling (that haven't come around to support PackageURLs) can use to continue working unaffected, even if CPAN::Meta::Requirements should be upgraded behind the scenes. And that if the tooling eventually does something with this new feature in C::M::R, they just make sure that they state their minimum required version accordingly, as always. Isn't that enough to allow an upgrade path that doesn't break any tooling downstream?
Well, sure. The amount of ecosystems out there are many, and it's not polite to ask a developer to list all combinations of ecosystems and package names in their Makefile.PL. Luckily, we don't have to solve this problem right here and right now. I guess it's feasible to optionally make use of some repology-based PURL translation service (or library, if the matrix isn't too large) to map between package names in different ecosystems. Not sure how this should be done, but I can't imagine this is too difficult. It would certainly be a welcome convenience. In the meantime, just stating the most common type+namespace+packagename combinations, would make a big positive difference! And if some ecosystem users feel left out, there's always the option to offer a PR to add the missing one. 🙂 Also, I think it's worth noting that the issue you are pointing out really isn't an argument against the introduction of PackageURls. There's still a need to be able to specify out-of-ecosystem dependencies – which we for any practical purposes DO NOT support at all right now – certainly not in an ecosystem-agnostic standardized manner! With that said, PackageURLs are not the only available options for uniquely identifying dependencies across ecosystems. The problem is that the other options right now are really bad. We could do something with SWID tags, but they are horrible and require a centralized index. We could also use CPE's (Common Platform Enumeration) which is also horrible, or we could use OmniBOR, which is some proprietary horror not worth touching. If you want some reading material on this topic, check out the highly relevant Software Identifying Ecosystem Option Analysis by CISA, published October 2023. They cover the problem domain of unique software identifies quite well. The only promising option IMO is PackageURLs, and while they by themselves don't solve the translation problem that you point out (and that repology attempts to solve), they are still the best option when one wants to specify requirements across ecosystem boundraries. The act of resolving these requirements, I think is a solvable problem, and an interesting discussion in itself, but probably suited for another forum? I guess that may be something to learn from repology when it comes to identifying what packages are called in the different ecosystems, but for the discussion we're having here, this isn't relevant. My proposal is for introducing PackageURL support in CPAN::Meta::Requirements, so that downstream consumers of this module can eventually implement support for the goodies this enables – with our without the help from repology. If this feature isn't added in this module, we can be certain nothing happens downstream. I'm sure that if someone puts together a concept that allows us to do the same, but only using repology, then that's definitely worth consideration! But WRT this ticket, I don't think this should be a blocker. 😸 |
I think you're suggesting adding PURL support to CPAN::Meta::Spec, so that The key problem, which Leon mentioned, is the devolved / distributed nature of the CPAN ecosystem. While we have lots of standard modules for processing metadata, there are lots of things out there where people have created systems which process metadata, and we've no way of knowing where breakage would happen. I've got a bunch of tools I've written over the years, and I know some of them would break. But that's just me. More worrying is that key parts of the ecosystem might break. So any support for PURLs would have to be alongside, rather than changing the existing core mechanisms, I think. Perhaps the place to start would be outlining the concrete benefits that CPAN authors and users, and the ecosystem maintainers, would get from PURL support, and making the case for it being worth the upheaval, and then a path to making that happen. I don't mean on this ticket, I mean elsewhere ;-) |
Yeah, this. I'm not arguing against the goal at all, but I do think it needs to be a separate field. |
Are you thinking of my comment to Perl-Toolchain-Gang/CPAN-Meta#79? That comment is starting to show it's age! 😅 If you think that's a better place to have this conversation, I'm happy to move it there. But wouldn't the implementation happen in CPAN::Meta::Requirements in any case? (not sure, so I'm happy to be corrected)
Could you share an example where breakage will happen? (I asked Leon the same). I guess that if someone decides to make use of this feature, they might add a requirement in the form of I'm not claiming that things won't break - and certainly not for the situation where someone rolls with their own parser instead of using some Toolchain-Gang supplied module for doing this. I guess under "normal" circumstances we could just rest on an implementation that follows the Liskov Substitution Principle, but with that being unlikely, wouldn't it still be feasible to reduce the size of a fix from do_something( $module_name, $version_range ); to... do_something( normalize_name($module_name), normalize_version_range($version_range) ); ...? I'm thinking that since URI::PackageURL is nearing a usable state for this, we now have a deterministic way to translate between purls and module names or dist names (depending on how the purl is written), and back. And since purls are just another way of writing module or dist names, I think it's meaningful to prioritize preserving the semantics (meaning, preserve the structure & meaning of how to specify requirements) instead of creating a separate parallel way of specifying dependencies. My intuition is that the amount of code to handle the first is less than to handle the second way. I'm struggling to see any "upheaval" here, so I'd still love to see examples of what you speak of...
I think I can put together something, but I'm wary of having this ticket depend on some community consensus of sorts. I believe the examples I gave in the OT should illustrate the main benefits well enough for anyone who cares about this topic and module, and if they are not, please tell me what is missing! I've purposefully not mentioned downstream benefits like "Easier generation of SBOMs that can be reused downstream without custom modification of component names" or "The possibility for non-CPAN software to specify dependencies in an ecosystem-agnostic way to software found on CPAN" or "A standardized ecosystem-agnostic way to refer to CPAN components that have known vulnerabilities" (e.g. PURL use in OpenVEX), or... Well, you get the point. Even a quick read of the CISA Analysis I linked to earlier, would help paint a picture of what's at stake, and how PURL support in ecosystems play into this as a solution. PURLs are still a "new" thing, and there's definitively a need for sharing info about it's uses and benefits, though if there are downstream users that may be affected, do you really think they'll even read any blog post about this, let alone share any thoughts on the matter? They'll definitely learn of PURLs if they have some code that breaks, though – and if the fix is trivial (e.g. like above), then that's a more reliable way to both get the word out, and to make things happen... |
The issues list for any distribution isn't the right place for this. You're proposing a major change to the underpinning of the CPAN toolchain, but then jumping down into the details. I think you need to back up and whether it's a blog post or a document somewhere, start with:
You're not getting people leaping to help you on this, because you haven't sold people a vision. That's where you need to start. It may be that you shared some of this at the PTS recently ...
If you can't see why people are nervous about mucking with the metadata that underpins so much of the CPAN ecosystem, then that just reinforces the need for the above piece of work first. It may be that the final set of changes would be relatively small, but it's going to take a lot of time, effort, and people's buy-in, to get to there. |
Why would it? CMR is a mapping of modules to version requirements, nothing more nothing less. Actually handling version requirements is like 90% of it really.
Lots of tools assume the keys are module names, if that's suddenly no longer true they'll get mightily confused. It's a variety of things like cpan clients, testing, authoring, packaging, etc. It seems like a far better idea to make this a separate field. |
Sure; Though I'm wary about making this into some public discussion, I guess I can put something together. I won't be able to do all of these, because some of the points you ask for do belong in an issue tracker, and I think it would be a waste of my time to figure out the details when there others who know the implementation could do the same with a 1/10 of the effort. I'll see what I can do.
Aah, well. The topic of PackageURLs have actually been a recurring theme both at PTS and in the CPANSec channel, though I guess there are many here who haven't been following those places. :-|
Yes.
That's why I'm also asking for examples. To put together a relevant case, I need to know the needs and concerns of the target audience (you), and that is why I (repeatedly!) ask for examples. Please show me examples. Don't tell me that you know stuff. Show me examples. (And of course, I'll be happy to offer my thanks when you do! But offering thanks before something is done, is putting the cart before the horse, I'm told. 😉 ) |
For those of you who are unfamiliar with PackageURLs and want to get a quick introduction while you wait for me to write something CPAN-specific, check out https://archive.fosdem.org/2022/schedule/event/package_url_and_version_range_spec/ 🙂 |
The primary problem is that any currently existing CPAN client would fail to resolve module |
Ok, so what are you asserting here? Is it that the specific requirement to the requirements-parsing module (when using the new syntax) needs special-casing during bootstrap? (an example to support your assertion would be useful). Or do you mean that any PackageURL would fail to resolve, no matter what is implemented in CPAN::Meta::Requirements? In the bootstrapping case, I guess that's something that needs to be taken into account in CPAN.pm and other build tooling, and I see after a cursory glance that there already exists code for doing something similar at least in CPAN.pm... So yes, I see that point, though I guess it's possible to add features to this dist without losing feature compatibility with older releases? Another option could be to immediately introduce some appropriate signal (warning, error, whatever) to CPAN::Meta::Requirements, that communicates something useful when an unknown module name is encountered. I see there's already some code to this effect for version ranges, so doing something similar for module names wouldn't seem like a huge step, I think... In the second case, wouldn't it be enough for a (non-bootstrapping) CPAN dist that decides to use purls to specify minimum version requirements for the tooling that is used during the distribution's configure or build phase? e.g. add |
The current way package URLs are specified for CPAN is IMO broken, and I wouldn't want to integrate it into any part of our toolchain. The Package URL spec is rather incomplete at the moment. It doesn't include anything about what the semantics of its URLs are. But it does imply some semantics. It states:
This implies that a Package URL is meant to be an identifier for a software package, which would mean some redistributable software package. In terms of CPAN, this would have to be a release tarball. A namespace is defined as:
So the namespace is something the name exists within. This mostly maps to the CPAN author. Package URLs for CPAN use CPAN purls also support So now there are two different types of purl sharing the same type, useful for mostly distinct purposes. As mentioned by others, specifying external dependencies is also problematic. It's not useful to have to include dependencies on every variant of a library from every packaging system. Package URLs as specified seem more designed for something like SBOM. I do not think they are fit for the purpose of specifying dependencies. |
It's not a parsing issue, it's a semantics issue, current CPAN clients do not know how to interpret such values. I have no idea what misconception you have that makes you think this could work.
No it wouldn't. That is too late, and at least in case of cpanm ineffective anyway (it uses a bundled CMR). |
I think we already covered this topic in a discussion on CPANSec IRC, but I guess it's also worth repeating here for posterity. :-) PackageURLs by themselves are only suited for referring to resolved dependencies. This means, they are useful for lockfiles, installation reports, SBOMs (as you say) or other situations where you either want to know exactly what was installed and where it came from, or when you want to reproduce a build. In these cases, the If the tooling understands full distnames (e.g. This means that In the case above, where the tooling understands prereqs in the for of But the common case is that prerequirements include some version constraints, and to manage this with PackageURLs, they must be accompanied by a "vers" version range url.
The "different dependency type" you speak of, is that the first one ( The second form can be useful in an SBOM in the sense that we can make it possible to refer to new types of dependencies in a standard manner. I gave a few of the as examples above, but with a little imagination I think we all can come up with examples that can improve dependency resolution across not just CPAN but in general.
Now, whether or not the purl spec is actually "broken" in it's current form, that's a really good conversation to explore either in an issue in URI::PackageURL, or in the purl-spec repo. Could you formulate a test case where the current syntax breaks down? The purl-spec is currently undergoing "cleanup" as part of a standardization process in ECMA's Technical Committee 54 (agendas & minutes mentioning purl), so any concerns with substance that you have, are very timely to raise right now. |
Yes, this is all fine with the
Version constraints don't work with the
So you agree that you are stuffing two distinct types into one.
The problem is the concept, not the syntax. |
Oh, don't have a conception that PURLs will be working immediately and out of the box just like that. No worries about that! What I do have a conception of, is that if this tooling is ever going to implement support for PackageURLs, then it's important that any underlying modules do the right thing before when it's time to implement it in the tooling. So I'm looking for modules with "separate concerns" like like this one, and see if it's possible to make something happen here.
Too late under which circumstances? How? (as for cpanm, let's just limit ourselves a little and declare that it is out-of-scope for this discussion for now.) |
Ok, sure, though not entirely correct (as things are now, leaving out the namespace means you're writing a module name, with the corresponding naming limitations). If there is a use case here that is important, then there's still time to update the spec The current form came out of the discussion in this ticket, and while the proposed changes are were merged into purl-spec in February, I'm optimistic that if there are some real concerns, they can be addressed. Would you mind adding your thoughts, accompanied with an illustrative example to that ticket? 😉
Hehe. "stuffing". Love the seriousness. 😁 Yes, there are already two distinct use cases + corresponding syntaxes that need to be covered when referring to packages on CPAN – 1) module+version prereqs, and 2) their resolved distribution names – and since there are two distinct ways to represent these, and they are already in use throughout CPAN, then what's your problem with using two "variants" of a new syntax to represent the same in PURLs? It almost seems like you're just arguing for the lols here, by asking for a fix to a fundamental design misfeature that was created more that two decades ago...
The concept mirrors reality as it is on CPAN right now. If you can think of a way to represent the necessary nuances with a "cleaner" concept, then please share! I've spent some time thinking of alternatives (some of you can read in that issue I linked above), and I'd love to see an improvement to this. |
I have no idea what you mean by this, and I suspect it may be at the core of our lack of communication.
This confuses the hell out of me. How can it be out of scope? |
Hei!
I'd like to propose to add support to specifying requirements in the form of PackageURLs (purls), in work in addition to the existing ways (using dist/module names).
With this, I'm hoping that we can get a step closer to supporting requirements that work across ecosystem boundaries.
e.g. the following...
...could be written as...
...and while this is fine, this also opens for a bunch of really cool new things!
I'm also hoping this to be a foundation for allowing non-cpan software to state any requirements they have for components published on CPAN, and maybe even one day make it easier for packagers (the folks that re-package CPAN dists into .deb or .rpm or other package archives) have an easier time figuring out how to translate and resolve dependencies across ecosystem boundaries. 😁
But for CPAN's case, I'm thinking support for purls starts with CPAN::Meta::Requirements?
I'm not entirely sure what's the best way to go about this, but since @giterlizzi recently added support for the 'vers' schema in URI::PackageURL, I'm thinking that's a place to start looking.
Should that module be made smaller/leaner? Are there other requirements (eg. around governance) that need to be fulfilled?
What needs be in place for a feature like this to be added to CPAN::Meta::Requirements?
(edit: added some more examples and clarifications)
The text was updated successfully, but these errors were encountered: