Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SPDX format support for SBOM #608

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

midnightercz
Copy link

Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms.
No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started.

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

@brunoapimentel
Copy link
Contributor

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

@midnightercz
Copy link
Author

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

I thought I will eventually integrate the changes into new subcommand implemented here: #593 but it wasn't merged yet when I started working on this one

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
SPDXPackage(
name=component.name,
version=component.version,
package_download_location=component.purl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will replace component.properties? Those carry crucial information. Enterprise Contract inspects them and, for example, blocks builds if the SBOM has any cachi2:pip:package:binary properties

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything equivalent for spdx package. See https://spdx.github.io/spdx-spec/v2.3/package-information/. Syft convert discards this information. I'll check with EC devs how to approach this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A solution could be to keep using CycloneDX in cachi2 and not even add SPDX support. If a lossy conversion is all that Konflux needs, the build pipeline can use syft convert for that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the release pipeline can do that. TBH I fail to see why builds would have to publish SPDX already. Is there a design doc for this?

@eskultety eskultety changed the title [STONEBLD-2714] Added SDPX format support for SBOM Add SDPX format support for SBOM Aug 27, 2024
key=lambda ref: (ref.referenceLocator, ref.referenceType, ref.referenceCategory),
)

return sorted(list(unique_items.values()), key=lambda item: (item.name, item.version))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here, list() is not needed

SPDXPackage(
name=component.name,
version=component.version,
package_download_location=component.purl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this parameter defined? package_download_location

Copy link
Contributor

@MartinBasti MartinBasti Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this function needs separate and better unittest, if expected SBOM is really good one, current test testing this is shady as it basically just tests if f()==f() with few extra steps

@MartinBasti
Copy link
Contributor

typo in commit message and PR: s/SDPX/SPDX/

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@midnightercz midnightercz changed the title Add SDPX format support for SBOM Add SPDX format support for SBOM Aug 29, 2024
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@midnightercz
Copy link
Author

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, a couple of minor changes is needed. Edit: you would also need to make sure UTs pass.

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
tests/unit/test_cli.py Show resolved Hide resolved
tests/unit/test_cli.py Outdated Show resolved Hide resolved
@a-ovchinnikov
Copy link
Collaborator

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

merge-sboms is a thin wrapper around an internal function that is tested elsewhere. It does not do much except some validation of input and then invoking a method for saving merge results (which is also tested elsewhere). There is not much value in doing this sort of test.

@midnightercz
Copy link
Author

I believe we can improve the way CycloneDX's properties are being translated to SPDX. For example:

{
  "name":  "cachi2:found_by", 
  "value": "cachi2"
}

Could be turned into:

{
  "annotator": "cachi2",
  "annotationDate": "2024-10-01T11:03:40.799221",
  "annotationType": "OTHER",
  "comment": "found_by:cachi2"
}

My reasoning is that the annotator field already states that Cachi2 is the author of that annotation, so we can drop the prefix cachi2:, which serves the same purpose. Then, we could just append the value in the end of the string, instead of having the inline JSON, which is arguably harder to read.

If folks like the idea, this can be done as a follow up, no need to block this PR because of this.

I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property {"name": "foo", "value": "bar"}, with your proposal it would turn into {"annotator": "bar", "comment": "foo"}. I think approach to dump whole property in json is more universal

@brunoapimentel
Copy link
Contributor

brunoapimentel commented Oct 7, 2024

I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property {"name": "foo", "value": "bar"}, with your proposal it would turn into {"annotator": "bar", "comment": "foo"}. I think approach to dump whole property in json is more universal

To clarify, my proposal is to omit the prefix we add to every Cachi2-generated property, since the annotator is already Cachi2. So it's not the value that becomes the annotator, as you mentioned above. The second part of the proposal is to remove the json formatting, since it does not add anything to the info.

{
    "name": "prefix:property"
    "value":  "content"
}
{
    "annotator": "prefix"
    "comment":  "property:content"
}

This should only apply to Cachi2-generated propertis, though. It is all we deal with for now.

Comment on lines +438 to +529
# Remove extra fields which are not in Sbom or SPDXSbom models
# Both SBom and SPDXSBom models are only subset of cyclonedx and SPDX specifications
# Therefore we need to make sure only fields accepted by the models are present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to perform this cleanup? I get that our models are only a subset of the full spec, but since this function is limited to support only Cachi2-produced SBOMs, I believe we could skip this sanitization here. Any SBOM that contains extra fields was not produced by Cachi2, and we don't need to handle them, we can just fail the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this santitization will make the unit test that uses a Syft SBOM fail... but I think that is the expected behavior, so we need to fix that test.

FAILED tests/unit/test_cli.py::TestMergeSboms::test_a_user_can_successfully_save_sboms_merge_results_to_a_file_in_spdx_format[sbom_files_to_merge0] - AssertionError: Error: UnexpectedFormat: /home/bpimente/Projects/cachi2/tests/unit/data/sboms/syft.bom.spdx.json does not appear to be a valid Cachi2 SBOM.
    Please check if the format of your file is correct.
    If yes, please let the maintainers know that Cachi2 doesn't handle it properly.

sbom: Union[Sbom, SPDXSbom] = Sbom(
components=merge_component_properties(
chain.from_iterable(cast(Sbom, s).components for s in sboms_to_merge)
chain.from_iterable(cast(Sbom, s).components for s in cyclonedx_sboms_to_merge)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use cast here? mypy seems to be fine without it. (same applies to the other uses of cast in this function)

if ref.referenceType == "purl":
purls.append(ref.referenceLocator)

# cyclonedx doens't support multiple purls, therefore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: doens't

annotator="cachi2",
annotationDate=datetime.datetime.now().isoformat(),
annotationType="OTHER",
comment=json.dumps({"name": f"{prop.name}", "value": f"{prop.value}"}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of the comment to be informative?

I need to list packages found by cachi2, and I'm wondering if this enough:

annotation.annotator == "cachi2"
annotation.annotationType == "OTHER"

Or do I also need to parse the comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @brunoapimentel or @chmeliik know?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the comments are the replacement for CycloneDX properties

PropertyName = Literal[
"cachi2:found_by",
"cachi2:missing_hash:in_file",
"cachi2:pip:package:binary",
"cdx:npm:package:bundled",
"cdx:npm:package:development",
]

But if you just need to list the packages found by cachi2, annotation.annotator == "cachi2" should be enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. Thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to speak behalf of cachi2 developers. I would say yes. But I wouldn't recommend it. With that approach you would simply rely on internal implementation detail which I believe isn't going to be guaranteed to be preserved in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. For found_by, that's probably ok, but when we get to other properties, we'll very much need to know more details. If comment is the only attribute available to stuff that information, then we'll need that to be stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that annotation.annotator will always be cachi2, since all the properties are inserted by Cachi2. This is also true for standard CycloneDX properties (such as cdx:npm:package:bundled).

In the context of Konflux, there will be annotations inserted by other tools in the final SBOM, but I'd say you can count on the annotator field to pick those out.

Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, some optional nitpicks, but I have a couple of small yet blocking questions on the new code. Also could you please squash those commits which revert changes in the earlier commits from the same PR?

)
)
# if there's no purl and no package name or version, it's just wrapping element for
# spdx package which is layer bellow SPDXDocument in relationships
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/layer bellow/one layer below/

)
# if there's no purl and no package name or version, it's just wrapping element for
# spdx package which is layer bellow SPDXDocument in relationships
if not purls and not package.name and not package.versionInfo:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking opinionated nitpick: not any((purls, package.name, package.versionInfo)) reads slightly better.

cachi2/core/models/sbom.py Show resolved Hide resolved
"""Convert a SPDX SBOM to a CycloneDX SBOM."""
components = []
for package in self.packages:
properties = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend going declarative on both properties and purls:

properties = [Property(**json.loads(an.comment)) for an in package.annotations]
purls = [ref.referenceLocator for ref in package.externalRefs if referenceType == "purl"]

Will not block on this one, but it does make code tidier.

cyclonedx_sboms_to_merge = []
for _sbom in sboms_to_merge:
if not isinstance(_sbom, Sbom):
cyclonedx_sboms_to_merge.append(_sbom.to_cyclonedx())
Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to merge SBOMs in different formats? If yes then wouldn't it make sense to declaratively and unconditionally convert them to same format first, then merge, then convert the result to the expected output format? Something like this:

sboms_to_merge = [sbom.to_cyclonedx() for sbom in sboms_to_merge]
resulting_sbom = Sbom(components=merge_component_properties(...), ...)
if sbom_type == SBOMFormat.spdx:
    resulting_sbom = resulting_sbom.convert_to_spdx()

Edit: spelling

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this and I'm not sure if that good idea regarding to multiarch sboms. I haven't seen one yet, but I assume it will contain hierarchy like: root -> virtual-arch-package -> [ arch packages, ... ]. That would be lost if converted to spdx, so I think we should avoid any conversion if it's not necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Cachi2 does not produce multiarch SBOMs, so we don't need to support merging those. We can update the merging script once that changes, but I don't see it happening in the near future.

@@ -462,6 +464,34 @@ def test_to_spdx(self, isodate: datetime.datetime) -> None:
),
]

def test_to_cyclonedx_around(self, isodate: datetime.datetime) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is rather cryptic. What do you think about
test_spdx_sbom_can_be_converted_to_cyclonedx_and_back_without_loosing_any_data?
The same question applies to all _around tests in this PR.

tox.ini Outdated
@@ -13,7 +13,7 @@ usedevelop = true
deps =
-rrequirements-extras.txt
commands =
pytest \
pytest -vv \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to the PR?

@midnightercz
Copy link
Author

I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property {"name": "foo", "value": "bar"}, with your proposal it would turn into {"annotator": "bar", "comment": "foo"}. I think approach to dump whole property in json is more universal

To clarify, my proposal is to omit the prefix we add to every Cachi2-generated property, since the annotator is already Cachi2. So it's not the value that becomes the annotator, as you mentioned above. The second part of the proposal is to remove the json formatting, since it does not add anything to the info.

{
    "name": "prefix:property"
    "value":  "content"
}
{
    "annotator": "prefix"
    "comment":  "property:content"
}

This should only apply to Cachi2-generated propertis, though. It is all we deal with for now.

There's property
Property(name="cachi2:missing_hash:in_file", value=str(lockfile_path)) in rpm module. How should that then be translated to annotation?

@brunoapimentel
Copy link
Contributor

There's property Property(name="cachi2:missing_hash:in_file", value=str(lockfile_path)) in rpm module. How should that then be translated to annotation?

My initial thought was:

{
    "annotator": "cachi2"
    "comment":  "missing_hash:in_file:./path/to/the/file"
}

But seeing the other thread you're keeping with Lui made me realize that Cachi2 adds standard CycloneDx properties (such as cdx:npm:package:bundled). So, cachi2: was actually a way to namespace our own custom properties, and I believe it should be kept.

I still don't quite like the inline JSON, and also that it has name and value that are part of the CycloneDX spec (it could be simplified to {"cachi2:missing_hash:in_file": "/path/to/the/file"}). I'll open a discussion about this later, as I said, I don't want this to block this PR.

I also saw that SPDX 3.0 adds a ContentType field to annotations, which we could leverage to state that it contains JSON when we move to the new version.

@midnightercz
Copy link
Author

I still don't quite like the inline JSON

I agree, but on the other hand, json dumps will handle any escaping, so as property name and value you can have anything which is compatible with cyclonedx property

Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the scope of this PR keeps growing. It might make sense splitting it into several PRs.

) -> List[SPDXRelation]:
"""Merge SPDX relationships.

Function takes two relationsships lists and unified list of packages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/relationsships/relationships/

"""Merge SPDX relationships.

Function takes two relationsships lists and unified list of packages.
For relationhips lists, map and inverse map of relations is created. SPDX document usually
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/relationhips/relationships/
s/is/are/

Function takes two relationsships lists and unified list of packages.
For relationhips lists, map and inverse map of relations is created. SPDX document usually
contains virtual package which serves as "envelope" for all real packages. These virtual
packages are search in the relationsships and thier ID is stored as middle element.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/are search/are searched/
s/relationsships/relationships/
s/thier/their/

@@ -405,6 +405,94 @@ def all_files_are_jsons(sbom_files_to_merge: Paths) -> Paths:
return all_files_are_jsons(enough_files_for_merge(list(set(sbom_files_to_merge))))


def merge_relationships(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two observations: first, this is not a part of CLI and second, the logic is quite complex, but there are no tests to ensure that it is done correctly. I think this function should be moved, split into high-level and low-level parts and covered with tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where would you move it?

root_ids = []
maps = []
inv_maps = []
middles = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if envelop_packages_ids would be better in conveying what this variable refers to? I found myself going back to the docstring several times to refresh what 'middle' and 'middles' mean.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can rename it.

@@ -405,6 +405,94 @@ def all_files_are_jsons(sbom_files_to_merge: Paths) -> Paths:
return all_files_are_jsons(enough_files_for_merge(list(set(sbom_files_to_merge))))


def merge_relationships(
relationships_list: List[List[SPDXRelation]], doc_ids: List[str], packages: List[SPDXPackage]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list is a sequence of variable length, the docstring states that the function takes two relationship lists. If it is just two then it would make more sense to have 'this' and 'that' r_lists and to merge them. Or some better name which would let everyone see that there are two distinct entities to be processed. 'main' and 'other' are good candidates given that those are already used in the code.

If it is not about merging two lists then I strongly recommend making it about merging two lists and then reducing any list of lists with it if needed.

Copy link
Author

@midnightercz midnightercz Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I created the method for two lists but I realized cachi2 actually can take multiple files.

If it is not about merging two lists then I strongly recommend making it about merging two lists and then reducing any list of lists with it if needed.

But that would mean another merge method call. I think merging all together is more efficient and calling merge(A,B) in cascade.

@midnightercz
Copy link
Author

It looks like the scope of this PR keeps growing. It might make sense splitting it into several PRs.

This is the last change. It won't grow more

Support for SPDX format was added to fetch-depds command and also
to merge_syft_sboms.
No changes were made in particular package manager generating components
which are then converted to cyclonedx format. SPDX sbom can be obtained
by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose
which output format should be generated - default is cyclonedx.
Once all tooling is ready to consume spdx sboms, cutoff changes
in this repository can be started.

Signed-off-by: Jindrich Luza <[email protected]>
…file_spdx

- package hash calculation helper
- better sbom_name attribute description

Signed-off-by: Jindrich Luza <[email protected]>
SPDXRef-DocumentRoot-File- includes all spdx packages and is set
to be described by SPDXRef-DOCUMENT. This way of spdx generation
is closer to way syft generates spdx

Signed-off-by: Jindrich Luza <[email protected]>
- minor refactoring
- renamed tests
- fixed typos

Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
- Packages with no versions are deduplicated

Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
- renamed middles to envelopes

Signed-off-by: Jindrich Luza <[email protected]>
Signed-off-by: Jindrich Luza <[email protected]>
- improved spdx merge tests

Signed-off-by: Jindrich Luza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants