-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SPDX format support for SBOM #608
base: main
Are you sure you want to change the base?
Add SPDX format support for SBOM #608
Conversation
I thought I will eventually integrate the changes into new subcommand implemented here: #593 but it wasn't merged yet when I started working on this one |
5e78923
to
e64cec4
Compare
e64cec4
to
4a10a9d
Compare
cachi2/core/models/sbom.py
Outdated
SPDXPackage( | ||
name=component.name, | ||
version=component.version, | ||
package_download_location=component.purl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will replace component.properties
? Those carry crucial information. Enterprise Contract inspects them and, for example, blocks builds if the SBOM has any cachi2:pip:package:binary
properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's anything equivalent for spdx package. See https://spdx.github.io/spdx-spec/v2.3/package-information/. Syft convert discards this information. I'll check with EC devs how to approach this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A solution could be to keep using CycloneDX in cachi2 and not even add SPDX support. If a lossy conversion is all that Konflux needs, the build pipeline can use syft convert
for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the release pipeline can do that. TBH I fail to see why builds would have to publish SPDX already. Is there a design doc for this?
cachi2/core/models/sbom.py
Outdated
key=lambda ref: (ref.referenceLocator, ref.referenceType, ref.referenceCategory), | ||
) | ||
|
||
return sorted(list(unique_items.values()), key=lambda item: (item.name, item.version)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here, list() is not needed
cachi2/core/models/sbom.py
Outdated
SPDXPackage( | ||
name=component.name, | ||
version=component.version, | ||
package_download_location=component.purl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this parameter defined? package_download_location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this function needs separate and better unittest, if expected SBOM is really good one, current test testing this is shady as it basically just tests if f()==f()
with few extra steps
typo in commit message and PR: s/SDPX/SPDX/ |
e08cfd6
to
0f308c0
Compare
Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, a couple of minor changes is needed. Edit: you would also need to make sure UTs pass.
|
24872b7
to
98d68df
Compare
I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property |
To clarify, my proposal is to omit the prefix we add to every Cachi2-generated property, since the annotator is already Cachi2. So it's not the value that becomes the annotator, as you mentioned above. The second part of the proposal is to remove the json formatting, since it does not add anything to the info.
This should only apply to Cachi2-generated propertis, though. It is all we deal with for now. |
# Remove extra fields which are not in Sbom or SPDXSbom models | ||
# Both SBom and SPDXSBom models are only subset of cyclonedx and SPDX specifications | ||
# Therefore we need to make sure only fields accepted by the models are present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to perform this cleanup? I get that our models are only a subset of the full spec, but since this function is limited to support only Cachi2-produced SBOMs, I believe we could skip this sanitization here. Any SBOM that contains extra fields was not produced by Cachi2, and we don't need to handle them, we can just fail the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this santitization will make the unit test that uses a Syft SBOM fail... but I think that is the expected behavior, so we need to fix that test.
FAILED tests/unit/test_cli.py::TestMergeSboms::test_a_user_can_successfully_save_sboms_merge_results_to_a_file_in_spdx_format[sbom_files_to_merge0] - AssertionError: Error: UnexpectedFormat: /home/bpimente/Projects/cachi2/tests/unit/data/sboms/syft.bom.spdx.json does not appear to be a valid Cachi2 SBOM.
Please check if the format of your file is correct.
If yes, please let the maintainers know that Cachi2 doesn't handle it properly.
cachi2/interface/cli.py
Outdated
sbom: Union[Sbom, SPDXSbom] = Sbom( | ||
components=merge_component_properties( | ||
chain.from_iterable(cast(Sbom, s).components for s in sboms_to_merge) | ||
chain.from_iterable(cast(Sbom, s).components for s in cyclonedx_sboms_to_merge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use cast
here? mypy
seems to be fine without it. (same applies to the other uses of cast
in this function)
cachi2/core/models/sbom.py
Outdated
if ref.referenceType == "purl": | ||
purls.append(ref.referenceLocator) | ||
|
||
# cyclonedx doens't support multiple purls, therefore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: doens't
cachi2/core/models/sbom.py
Outdated
annotator="cachi2", | ||
annotationDate=datetime.datetime.now().isoformat(), | ||
annotationType="OTHER", | ||
comment=json.dumps({"name": f"{prop.name}", "value": f"{prop.value}"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the purpose of the comment to be informative?
I need to list packages found by cachi2, and I'm wondering if this enough:
annotation.annotator == "cachi2"
annotation.annotationType == "OTHER"
Or do I also need to parse the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @brunoapimentel or @chmeliik know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the comments are the replacement for CycloneDX properties
cachi2/cachi2/core/models/sbom.py
Lines 7 to 13 in b289c91
PropertyName = Literal[ | |
"cachi2:found_by", | |
"cachi2:missing_hash:in_file", | |
"cachi2:pip:package:binary", | |
"cdx:npm:package:bundled", | |
"cdx:npm:package:development", | |
] |
But if you just need to list the packages found by cachi2, annotation.annotator == "cachi2"
should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to speak behalf of cachi2 developers. I would say yes. But I wouldn't recommend it. With that approach you would simply rely on internal implementation detail which I believe isn't going to be guaranteed to be preserved in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For found_by
, that's probably ok, but when we get to other properties, we'll very much need to know more details. If comment is the only attribute available to stuff that information, then we'll need that to be stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that annotation.annotator
will always be cachi2
, since all the properties are inserted by Cachi2. This is also true for standard CycloneDX properties (such as cdx:npm:package:bundled
).
In the context of Konflux, there will be annotations inserted by other tools in the final SBOM, but I'd say you can count on the annotator field to pick those out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, some optional nitpicks, but I have a couple of small yet blocking questions on the new code. Also could you please squash those commits which revert changes in the earlier commits from the same PR?
cachi2/core/models/sbom.py
Outdated
) | ||
) | ||
# if there's no purl and no package name or version, it's just wrapping element for | ||
# spdx package which is layer bellow SPDXDocument in relationships |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/layer bellow/one layer below/
cachi2/core/models/sbom.py
Outdated
) | ||
# if there's no purl and no package name or version, it's just wrapping element for | ||
# spdx package which is layer bellow SPDXDocument in relationships | ||
if not purls and not package.name and not package.versionInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking opinionated nitpick: not any((purls, package.name, package.versionInfo))
reads slightly better.
cachi2/core/models/sbom.py
Outdated
"""Convert a SPDX SBOM to a CycloneDX SBOM.""" | ||
components = [] | ||
for package in self.packages: | ||
properties = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly recommend going declarative on both properties
and purls
:
properties = [Property(**json.loads(an.comment)) for an in package.annotations]
purls = [ref.referenceLocator for ref in package.externalRefs if referenceType == "purl"]
Will not block on this one, but it does make code tidier.
cyclonedx_sboms_to_merge = [] | ||
for _sbom in sboms_to_merge: | ||
if not isinstance(_sbom, Sbom): | ||
cyclonedx_sboms_to_merge.append(_sbom.to_cyclonedx()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to merge SBOMs in different formats? If yes then wouldn't it make sense to declaratively and unconditionally convert them to same format first, then merge, then convert the result to the expected output format? Something like this:
sboms_to_merge = [sbom.to_cyclonedx() for sbom in sboms_to_merge]
resulting_sbom = Sbom(components=merge_component_properties(...), ...)
if sbom_type == SBOMFormat.spdx:
resulting_sbom = resulting_sbom.convert_to_spdx()
Edit: spelling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this and I'm not sure if that good idea regarding to multiarch sboms. I haven't seen one yet, but I assume it will contain hierarchy like: root -> virtual-arch-package -> [ arch packages, ... ]. That would be lost if converted to spdx, so I think we should avoid any conversion if it's not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, Cachi2 does not produce multiarch SBOMs, so we don't need to support merging those. We can update the merging script once that changes, but I don't see it happening in the near future.
tests/unit/models/test_sbom.py
Outdated
@@ -462,6 +464,34 @@ def test_to_spdx(self, isodate: datetime.datetime) -> None: | |||
), | |||
] | |||
|
|||
def test_to_cyclonedx_around(self, isodate: datetime.datetime) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is rather cryptic. What do you think about
test_spdx_sbom_can_be_converted_to_cyclonedx_and_back_without_loosing_any_data
?
The same question applies to all _around
tests in this PR.
tox.ini
Outdated
@@ -13,7 +13,7 @@ usedevelop = true | |||
deps = | |||
-rrequirements-extras.txt | |||
commands = | |||
pytest \ | |||
pytest -vv \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to the PR?
There's property |
My initial thought was:
But seeing the other thread you're keeping with Lui made me realize that Cachi2 adds standard CycloneDx properties (such as I still don't quite like the inline JSON, and also that it has I also saw that SPDX 3.0 adds a |
717769b
to
be48ebb
Compare
I agree, but on the other hand, json dumps will handle any escaping, so as property name and value you can have anything which is compatible with cyclonedx property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the scope of this PR keeps growing. It might make sense splitting it into several PRs.
cachi2/interface/cli.py
Outdated
) -> List[SPDXRelation]: | ||
"""Merge SPDX relationships. | ||
|
||
Function takes two relationsships lists and unified list of packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/relationsships/relationships/
cachi2/interface/cli.py
Outdated
"""Merge SPDX relationships. | ||
|
||
Function takes two relationsships lists and unified list of packages. | ||
For relationhips lists, map and inverse map of relations is created. SPDX document usually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/relationhips/relationships/
s/is/are/
cachi2/interface/cli.py
Outdated
Function takes two relationsships lists and unified list of packages. | ||
For relationhips lists, map and inverse map of relations is created. SPDX document usually | ||
contains virtual package which serves as "envelope" for all real packages. These virtual | ||
packages are search in the relationsships and thier ID is stored as middle element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/are search/are searched/
s/relationsships/relationships/
s/thier/their/
@@ -405,6 +405,94 @@ def all_files_are_jsons(sbom_files_to_merge: Paths) -> Paths: | |||
return all_files_are_jsons(enough_files_for_merge(list(set(sbom_files_to_merge)))) | |||
|
|||
|
|||
def merge_relationships( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two observations: first, this is not a part of CLI and second, the logic is quite complex, but there are no tests to ensure that it is done correctly. I think this function should be moved, split into high-level and low-level parts and covered with tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where would you move it?
cachi2/interface/cli.py
Outdated
root_ids = [] | ||
maps = [] | ||
inv_maps = [] | ||
middles = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if envelop_packages_ids
would be better in conveying what this variable refers to? I found myself going back to the docstring several times to refresh what 'middle' and 'middles' mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can rename it.
@@ -405,6 +405,94 @@ def all_files_are_jsons(sbom_files_to_merge: Paths) -> Paths: | |||
return all_files_are_jsons(enough_files_for_merge(list(set(sbom_files_to_merge)))) | |||
|
|||
|
|||
def merge_relationships( | |||
relationships_list: List[List[SPDXRelation]], doc_ids: List[str], packages: List[SPDXPackage] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A list is a sequence of variable length, the docstring states that the function takes two relationship lists. If it is just two then it would make more sense to have 'this' and 'that' r_lists and to merge them. Or some better name which would let everyone see that there are two distinct entities to be processed. 'main' and 'other' are good candidates given that those are already used in the code.
If it is not about merging two lists then I strongly recommend making it about merging two lists and then reducing any list of lists with it if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I created the method for two lists but I realized cachi2 actually can take multiple files.
If it is not about merging two lists then I strongly recommend making it about merging two lists and then reducing any list of lists with it if needed.
But that would mean another merge method call. I think merging all together is more efficient and calling merge(A,B) in cascade.
This is the last change. It won't grow more |
Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms. No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx(). New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started. Signed-off-by: Jindrich Luza <[email protected]>
…file_spdx - package hash calculation helper - better sbom_name attribute description Signed-off-by: Jindrich Luza <[email protected]>
SPDXRef-DocumentRoot-File- includes all spdx packages and is set to be described by SPDXRef-DOCUMENT. This way of spdx generation is closer to way syft generates spdx Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
- minor refactoring - renamed tests - fixed typos Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
- Packages with no versions are deduplicated Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
a2606bc
to
28fc1a7
Compare
- renamed middles to envelopes Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
28fc1a7
to
4f1cb8c
Compare
- improved spdx merge tests Signed-off-by: Jindrich Luza <[email protected]>
Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms.
No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started.
Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)