This repository contains the code and data produced for the paper Challenges of Producing Software Bill Of Materials for Java (IEEE Security & Privacy, 2023).
@article{sbomchallenges,
title = {Challenges of Producing Software Bill Of Materials for Java},
journal = {IEEE Security \& Privacy},
year = {2023},
doi = {10.1109/MSEC.2023.3302956},
author = {Musard Balliu and Benoit Baudry and Sofia Bobadilla and Mathias Ekstedt and Martin Monperrus and Javier Ron and Aman Sharma and Gabriel Skoglund and César Soto-Valero and Martin Wittlinger},
url = {http://arxiv.org/pdf/2303.11102},
}
The structure of the repository is as follows:
sbom-production
contains all scripts used for creating CycloneDX SBOM files for each of the 26 study subjects using 6 different SBOM producers.ground-truth-production
contains all scripts used for extracting a ground truth dataset of dependency trees for each study subject.metrics-computation
contains all code used for computing metrics relating to the performance of the SBOM tools.results-march-2023
contains all experimental data.sbom2023_plot
contains additional code and resources related to the creation of figures for the paper.
The performance of the following 6 CycloneDX SBOM producers were studied:
These are the latest versions as of
Fri 5 May 2023 13:02:33 CEST
.
Producer | Version |
---|---|
Build Info Go | 1.9.3 |
CycloneDX Generator | 8.4.3 |
CycloneDX Maven Plugin | 2.7.8 |
jbom | 1.2.1 |
OpenRewrite | 4.45.0 |
Depscan | 4.1.2 |
The following versions of 26 Java projects using Maven were selected as study subjects:
# | GitHub Repository | Commit Hash | Stable release as of 01.01.23 |
---|---|---|---|
1 | jenkins | ce7e5d7 | 2.384 |
2 | mybatis-3 | c195f12 | 3.5.11 |
3 | flink | c41c8e5 | 1.15.3 |
4 | checkstyle | 233c91b | 10.6.0 |
5 | CoreNLP | f7782ff | 4.5.1 |
6 | neo4j | c082e80 | 5.3.0 |
7 | async-http-client | 7a370af | 2.12.3 |
8 | error-prone | 27de40b | 2.17.0 |
9 | alluxio | d5919d8 | 2.9.0 |
10 | javaparser | 1ae25f3 | 3.15.15 |
11 | undertow | f52b70c | 2.3.2.Final |
12 | webcam-capture | e19125c | 0.3.12 |
13 | handlebars.java | 2afc50f | 4.2.1 |
14 | jooby | f71b551 | 3.0.0.M1 |
15 | tika | 41319f3 | 2.6.0 |
16 | orika | eef8209 | 1.5.4 |
17 | spoon | ee73f43 | 10.2.0 |
18 | accumulo | 706612f | 2.1.0 |
19 | couchdb-lucene | 8554737 | 2.1.0 |
20 | jHiccup | a440bda | 2.0.10 |
21 | vulnerability-assessment-tool | 3d261af | 3.2.5 |
22 | para | 41d9005 | 1.47.2 |
23 | launch4j-maven-plugin | 3f9818e | 2.2.0 |
24 | jacop | 1a395e6 | 4.9.0 |
25 | selenese-runner-java | 3e84e8e | 4.2.0 |
26 | commons-configuration | 59e5152 | 2.8.0 |
If you are interested in reproducing our results, the script reproduce.sh
is provided for your convenience. This script will do the following:
- Generate SBOMs for each study subject and SBOM producer.
- Extract ground truth dependency information from each study subject.
- Calculate the accuracy/precision for each SBOM producer and compare these values with our results, outputting whether the values match or not.
⚠️ Please note that this script can take a considerable amount of time (~2 hours on a laptop) since SBOM production needs to be carried out by 6 different producers on 26 different study subjects.
- Java version 17 or newer
- Apache Maven
- Docker
- Python 3.10 or newer