CONCEPT

Container-extracted SBOM¶

Definition¶

A container-extracted SBOM is a Software Bill of Materials produced by scanning the built container image rather than the source tree or package lock-files. The scanner reads the layered filesystem, identifies OS-level packages (via /var/lib/dpkg, /var/lib/rpm, apk indexes), language-ecosystem package metadata (package-lock.json, go.sum, pom.xml artifacts, *.dist-info/METADATA), and binary markers left by package managers, then emits a unified SBOM spanning all of them.

The canonical tool is syft; its output (in systems/cyclonedx or systems/spdx format) is consumed by grype for CVE matching.

Why it matters: source-tree scans miss things¶

A source-tree scan only captures what the language package manager knows about. It cannot see:

OS packages contributed by the base image. If your FROM python:3.11-slim base inherits a vulnerable libssl1.1, a source-tree pip audit never sees it. The container-extracted SBOM does.
Vendored copies not in the lock-file. Code copied into the repo (common in older Go / C++ codebases) is invisible to the language package manager but visible to the container scan.
What actually shipped, not what the build intended to ship. A multi-stage Docker build might discard build-time dependencies in the final stage; source-tree scans don't know this. The container scan sees only the runtime image.
Contributions from tooling. Shell utilities, language runtimes, compiled binaries added by RUN steps — all invisible to source-tree scanners.

Failure modes¶

concepts/uber-jar-metadata-loss is a container-scan failure mode too — if the JVM app ships as a shaded uber-jar inside the image, the scanner sees one jar and misses all the constituent libraries. The remediation is to fix the build (stop shading) rather than change the scan locus.
Scratch / distroless images with binaries that bundle their dependencies statically (Go, Rust) can be harder to scan because the filesystem has no package metadata; scanners fall back to binary introspection heuristics with lower fidelity.
Layer caching can surface stale packages. If the base image layer isn't rebuilt, the SBOM continues to report old OS-package versions. Discipline: rebuild the image regularly even when app code hasn't changed.

Canonical wiki instance (Zalando 2023-04-12)¶

Zalando's SBOM pipeline scans the Container image for every deployed application — not the source repo. The post names this explicitly: "We publish a curated data set containing dependency data from the SBOM for every application we deploy, based on its Container image" (Source: ). This choice is what makes cross-language analytics over the fleet tractable — whether an app is Python, Go, Java, Kotlin, Scala, or JavaScript, its SBOM has a uniform shape rooted in the container layer structure.

Trade-offs vs source-tree scan¶

Dimension	Source-tree	Container-extracted
OS packages	❌ invisible	✅ captured
Vendored copies	❌ partial	✅ captured
Fidelity to shipped artifact	❌ build-intent only	✅ what ships
Speed	✅ fast (no build)	❌ requires built image
CI integration	✅ pre-build stage	❌ post-build stage
Per-language fidelity	✅ high (native metadata)	⚠️ depends on scanner heuristics
Supports fleet analytics	❌ repo-by-repo shape	✅ uniform per-image shape

Zalando's choice — container-extracted — optimises for the fleet analytics row, at the cost of scanning post-build rather than pre-build.

Seen in¶

— canonical wiki instance. Container-image SBOM extraction as the fleet-wide standard at Zalando.

concepts/sbom-software-bill-of-materials — parent concept.
concepts/uber-jar-metadata-loss — the JVM-specific failure mode that survives the choice of scan locus.
patterns/sbom-as-queryable-data-lake-asset — the downstream pattern that the container-extracted uniformity enables.
systems/syft · systems/grype — the canonical tooling.
systems/docker — the substrate.