Skip to content

CONCEPT Cited by 1 source

Container-extracted SBOM

Definition

A container-extracted SBOM is a Software Bill of Materials produced by scanning the built container image rather than the source tree or package lock-files. The scanner reads the layered filesystem, identifies OS-level packages (via /var/lib/dpkg, /var/lib/rpm, apk indexes), language-ecosystem package metadata (package-lock.json, go.sum, pom.xml artifacts, *.dist-info/METADATA), and binary markers left by package managers, then emits a unified SBOM spanning all of them.

The canonical tool is syft; its output (in systems/cyclonedx or systems/spdx format) is consumed by grype for CVE matching.

Why it matters: source-tree scans miss things

A source-tree scan only captures what the language package manager knows about. It cannot see:

  • OS packages contributed by the base image. If your FROM python:3.11-slim base inherits a vulnerable libssl1.1, a source-tree pip audit never sees it. The container-extracted SBOM does.
  • Vendored copies not in the lock-file. Code copied into the repo (common in older Go / C++ codebases) is invisible to the language package manager but visible to the container scan.
  • What actually shipped, not what the build intended to ship. A multi-stage Docker build might discard build-time dependencies in the final stage; source-tree scans don't know this. The container scan sees only the runtime image.
  • Contributions from tooling. Shell utilities, language runtimes, compiled binaries added by RUN steps — all invisible to source-tree scanners.

Failure modes

  • concepts/uber-jar-metadata-loss is a container-scan failure mode too — if the JVM app ships as a shaded uber-jar inside the image, the scanner sees one jar and misses all the constituent libraries. The remediation is to fix the build (stop shading) rather than change the scan locus.
  • Scratch / distroless images with binaries that bundle their dependencies statically (Go, Rust) can be harder to scan because the filesystem has no package metadata; scanners fall back to binary introspection heuristics with lower fidelity.
  • Layer caching can surface stale packages. If the base image layer isn't rebuilt, the SBOM continues to report old OS-package versions. Discipline: rebuild the image regularly even when app code hasn't changed.

Canonical wiki instance (Zalando 2023-04-12)

Zalando's SBOM pipeline scans the Container image for every deployed application — not the source repo. The post names this explicitly: "We publish a curated data set containing dependency data from the SBOM for every application we deploy, based on its Container image" (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game). This choice is what makes cross-language analytics over the fleet tractable — whether an app is Python, Go, Java, Kotlin, Scala, or JavaScript, its SBOM has a uniform shape rooted in the container layer structure.

Trade-offs vs source-tree scan

Dimension Source-tree Container-extracted
OS packages ❌ invisible ✅ captured
Vendored copies ❌ partial ✅ captured
Fidelity to shipped artifact ❌ build-intent only ✅ what ships
Speed ✅ fast (no build) ❌ requires built image
CI integration ✅ pre-build stage ❌ post-build stage
Per-language fidelity ✅ high (native metadata) ⚠️ depends on scanner heuristics
Supports fleet analytics ❌ repo-by-repo shape ✅ uniform per-image shape

Zalando's choice — container-extracted — optimises for the fleet analytics row, at the cost of scanning post-build rather than pre-build.

Seen in

Last updated · 501 distilled / 1,218 read