Skip to content

CONCEPT Cited by 1 source

Software Bill of Materials (SBOM)

Definition

A Software Bill of Materials (SBOM) is a machine-readable inventory of every package and library that composes a software artifact. Each entry records, at minimum, the name, version, and license of a component; richer formats add supplier, cryptographic hash, package URL (purl), dependency relationships, and known-vulnerability references.

The canonical machine-readable formats are CycloneDX (OWASP) and SPDX (Linux Foundation). Both encode the same conceptual object with different schemas, and both are emitted by tools like syft and consumed by tools like grype for CVE correlation.

What an SBOM captures

For each component:

  • Identity: name + version + optional purl (Package URL, e.g. pkg:maven/org.apache.logging.log4j/log4j-core@2.14.0).
  • License: SPDX identifier (Apache-2.0, MIT, BSL-1.1, etc.). This is what makes license-change impact analysis possible — the 2022 Akka re-licensing (Apache-2.0 → Business Source License) turned into a scoped SBOM query across Zalando's fleet.
  • Relationships: whether a component is a direct vs transitive dependency; the parent → child edges form a dependency graph captured in the SBOM.
  • Supplier / source (optional in CycloneDX, commonly populated for SPDX).

For each artifact (container image, jar, binary):

  • OS-level packages (if the SBOM is extracted from a container — see concepts/container-extracted-sbom).
  • Application-level dependencies (everything the language package manager knows about).
  • Build-time metadata (timestamp, builder identity, upstream source commit hash).

Why SBOMs exist

Three forcing functions industry-wide:

  1. Mass vulnerability response. Log4Shell (CVE-2021-44228, December 2021) forced every large software org to answer "which of our applications contain log4j-core between 2.0-beta9 and 2.14.1?" in hours, not weeks. Organisations without an SBOM corpus scrambled through per-repo greps, build-system introspection, and manual audits. Those with an SBOM corpus ran a single query. Zalando's post frames this as the defining use-case (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game).
  2. License-change impact assessment. Open-source relicensing (Akka, Redis, Elastic, HashiCorp products) turned from hypothetical risk to concrete migration scoping. "Which of our apps link a module of the newly- relicensed library, at what version, at what depth (direct vs transitive)?" is an SBOM query.
  3. Regulatory compliance. US Executive Order 14028 (May 2021) and NIST SP 800-218 / SSDF require SBOMs for software sold to the federal government. EU Cyber Resilience Act (adopted 2024) extends similar obligations.

Generation loci

An SBOM can be generated at three different points in the pipeline, each trading off accuracy vs ease:

  • Source-tree scan (npm audit, cargo tree, go mod graph, IDE plugins). Fastest, most ergonomic; but captures only what the language package manager knows and can miss runtime plugins, vendored copies, or OS-level packages pulled in by the base image.
  • Container-image scan (syft, grype on the built image). Canonicalised as concepts/container-extracted-sbom. Captures OS packages + whatever actually shipped, not what the build intended to ship. Zalando's choice (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game).
  • Runtime scan (inspect the running process). Rarest; useful for languages with dynamic loading (Python plugins, JVM Class.forName). Not addressed by the Zalando post.

Failure modes

  • concepts/uber-jar-metadata-loss: JVM builds that shade / fat-jar / uber-jar all dependencies into a single archive lose the per-dep metadata the SBOM scanner needs. The SBOM comes back empty or severely undercounted.
  • Divergent package identity across scans. Same library can appear under different group-IDs / artifact-names across JVM ecosystems, making cross-app correlation harder than string-equality on the name.
  • Transitive-reachability gap. An SBOM says "library X is present." It doesn't say "code path Y in my app calls into X." concepts/transitive-dependency-reachability closes this gap at the per-binary altitude; the SBOM closes it at the fleet altitude; both are needed for "am I actually exploitable via this CVE?" analysis.

Canonical wiki instance (Zalando 2023-04-12)

Zalando extracts an SBOM from every deployed container image, publishes it as a curated dataset in the company data lake, and makes it SQL-queryable by any engineer (patterns/sbom-as-queryable-data-lake-asset). This transforms dependency governance from a per-repo pull-request- chasing game (systems/dependabot, systems/scala-steward, maven-versions-plugin) into a cross-fleet analytics workload. Named use-cases: Log4Shell mass-patch, Akka license-change footprint, AWS SDK full- vs-modules bloat audit (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game).

Seen in

Last updated · 501 distilled / 1,218 read