CONCEPT Cited by 1 source
Software Bill of Materials (SBOM)¶
Definition¶
A Software Bill of Materials (SBOM) is a machine-readable
inventory of every package and library that composes a
software artifact. Each entry records, at minimum, the
name, version, and license of a component; richer
formats add supplier, cryptographic hash, package URL
(purl), dependency relationships, and known-vulnerability
references.
The canonical machine-readable formats are CycloneDX (OWASP) and SPDX (Linux Foundation). Both encode the same conceptual object with different schemas, and both are emitted by tools like syft and consumed by tools like grype for CVE correlation.
What an SBOM captures¶
For each component:
- Identity: name + version + optional
purl(Package URL, e.g.pkg:maven/org.apache.logging.log4j/log4j-core@2.14.0). - License: SPDX identifier (
Apache-2.0,MIT,BSL-1.1, etc.). This is what makes license-change impact analysis possible — the 2022 Akka re-licensing (Apache-2.0 → Business Source License) turned into a scoped SBOM query across Zalando's fleet. - Relationships: whether a component is a direct vs transitive dependency; the parent → child edges form a dependency graph captured in the SBOM.
- Supplier / source (optional in CycloneDX, commonly populated for SPDX).
For each artifact (container image, jar, binary):
- OS-level packages (if the SBOM is extracted from a container — see concepts/container-extracted-sbom).
- Application-level dependencies (everything the language package manager knows about).
- Build-time metadata (timestamp, builder identity, upstream source commit hash).
Why SBOMs exist¶
Three forcing functions industry-wide:
- Mass vulnerability response. Log4Shell
(CVE-2021-44228,
December 2021) forced every large software org to answer
"which of our applications contain
log4j-corebetween 2.0-beta9 and 2.14.1?" in hours, not weeks. Organisations without an SBOM corpus scrambled through per-repo greps, build-system introspection, and manual audits. Those with an SBOM corpus ran a single query. Zalando's post frames this as the defining use-case (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game). - License-change impact assessment. Open-source relicensing (Akka, Redis, Elastic, HashiCorp products) turned from hypothetical risk to concrete migration scoping. "Which of our apps link a module of the newly- relicensed library, at what version, at what depth (direct vs transitive)?" is an SBOM query.
- Regulatory compliance. US Executive Order 14028 (May 2021) and NIST SP 800-218 / SSDF require SBOMs for software sold to the federal government. EU Cyber Resilience Act (adopted 2024) extends similar obligations.
Generation loci¶
An SBOM can be generated at three different points in the pipeline, each trading off accuracy vs ease:
- Source-tree scan (
npm audit,cargo tree,go mod graph, IDE plugins). Fastest, most ergonomic; but captures only what the language package manager knows and can miss runtime plugins, vendored copies, or OS-level packages pulled in by the base image. - Container-image scan (syft, grype on the built image). Canonicalised as concepts/container-extracted-sbom. Captures OS packages + whatever actually shipped, not what the build intended to ship. Zalando's choice (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game).
- Runtime scan (inspect the running process). Rarest;
useful for languages with dynamic loading (Python plugins,
JVM
Class.forName). Not addressed by the Zalando post.
Failure modes¶
- concepts/uber-jar-metadata-loss: JVM builds that shade / fat-jar / uber-jar all dependencies into a single archive lose the per-dep metadata the SBOM scanner needs. The SBOM comes back empty or severely undercounted.
- Divergent package identity across scans. Same library can appear under different group-IDs / artifact-names across JVM ecosystems, making cross-app correlation harder than string-equality on the name.
- Transitive-reachability gap. An SBOM says "library X is present." It doesn't say "code path Y in my app calls into X." concepts/transitive-dependency-reachability closes this gap at the per-binary altitude; the SBOM closes it at the fleet altitude; both are needed for "am I actually exploitable via this CVE?" analysis.
Canonical wiki instance (Zalando 2023-04-12)¶
Zalando extracts an SBOM from every deployed container image,
publishes it as a curated dataset in the company data
lake, and makes it SQL-queryable by any engineer
(patterns/sbom-as-queryable-data-lake-asset). This
transforms dependency governance from a per-repo pull-request-
chasing game (systems/dependabot, systems/scala-steward,
maven-versions-plugin) into a cross-fleet analytics
workload. Named use-cases: Log4Shell mass-patch,
Akka license-change footprint, AWS SDK full-
vs-modules bloat audit (Source:
sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game).
Seen in¶
- sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game — canonical wiki instance. Zalando's SBOM-as-data-lake platform and the log4j / Akka / AWS-SDK-bloat use-cases.
Related¶
- concepts/container-extracted-sbom — the specific generation-locus choice.
- concepts/uber-jar-metadata-loss — JVM-specific gotcha.
- concepts/transitive-dependency-reachability — the per-binary complement to SBOM fleet analysis.
- patterns/sbom-as-queryable-data-lake-asset — the architectural pattern for making SBOMs organisationally useful.
- patterns/vulnerability-fleet-sweep-via-sbom-query — the mass-patch playbook SBOMs enable.
- patterns/dependency-update-discipline — the per-repo tactical layer the SBOM corpus complements.
- systems/cyclonedx · systems/spdx — SBOM formats.
- systems/syft · systems/grype — SBOM generator + vulnerability scanner.