CONCEPT Cited by 1 source
Container-extracted SBOM¶
Definition¶
A container-extracted SBOM is a Software Bill of Materials
produced by scanning the built container image rather than
the source tree or package lock-files. The scanner
reads the layered filesystem, identifies OS-level packages
(via /var/lib/dpkg, /var/lib/rpm, apk indexes),
language-ecosystem package metadata
(package-lock.json, go.sum, pom.xml artifacts,
*.dist-info/METADATA), and binary markers left by package
managers, then emits a unified SBOM spanning all of them.
The canonical tool is syft; its output (in systems/cyclonedx or systems/spdx format) is consumed by grype for CVE matching.
Why it matters: source-tree scans miss things¶
A source-tree scan only captures what the language package manager knows about. It cannot see:
- OS packages contributed by the base image. If your
FROM python:3.11-slimbase inherits a vulnerablelibssl1.1, a source-treepip auditnever sees it. The container-extracted SBOM does. - Vendored copies not in the lock-file. Code copied into the repo (common in older Go / C++ codebases) is invisible to the language package manager but visible to the container scan.
- What actually shipped, not what the build intended to ship. A multi-stage Docker build might discard build-time dependencies in the final stage; source-tree scans don't know this. The container scan sees only the runtime image.
- Contributions from tooling. Shell utilities, language
runtimes, compiled binaries added by
RUNsteps — all invisible to source-tree scanners.
Failure modes¶
- concepts/uber-jar-metadata-loss is a container-scan failure mode too — if the JVM app ships as a shaded uber-jar inside the image, the scanner sees one jar and misses all the constituent libraries. The remediation is to fix the build (stop shading) rather than change the scan locus.
- Scratch / distroless images with binaries that bundle their dependencies statically (Go, Rust) can be harder to scan because the filesystem has no package metadata; scanners fall back to binary introspection heuristics with lower fidelity.
- Layer caching can surface stale packages. If the base image layer isn't rebuilt, the SBOM continues to report old OS-package versions. Discipline: rebuild the image regularly even when app code hasn't changed.
Canonical wiki instance (Zalando 2023-04-12)¶
Zalando's SBOM pipeline scans the Container image for every deployed application — not the source repo. The post names this explicitly: "We publish a curated data set containing dependency data from the SBOM for every application we deploy, based on its Container image" (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game). This choice is what makes cross-language analytics over the fleet tractable — whether an app is Python, Go, Java, Kotlin, Scala, or JavaScript, its SBOM has a uniform shape rooted in the container layer structure.
Trade-offs vs source-tree scan¶
| Dimension | Source-tree | Container-extracted |
|---|---|---|
| OS packages | ❌ invisible | ✅ captured |
| Vendored copies | ❌ partial | ✅ captured |
| Fidelity to shipped artifact | ❌ build-intent only | ✅ what ships |
| Speed | ✅ fast (no build) | ❌ requires built image |
| CI integration | ✅ pre-build stage | ❌ post-build stage |
| Per-language fidelity | ✅ high (native metadata) | ⚠️ depends on scanner heuristics |
| Supports fleet analytics | ❌ repo-by-repo shape | ✅ uniform per-image shape |
Zalando's choice — container-extracted — optimises for the fleet analytics row, at the cost of scanning post-build rather than pre-build.
Seen in¶
- sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game — canonical wiki instance. Container-image SBOM extraction as the fleet-wide standard at Zalando.
Related¶
- concepts/sbom-software-bill-of-materials — parent concept.
- concepts/uber-jar-metadata-loss — the JVM-specific failure mode that survives the choice of scan locus.
- patterns/sbom-as-queryable-data-lake-asset — the downstream pattern that the container-extracted uniformity enables.
- systems/syft · systems/grype — the canonical tooling.
- systems/docker — the substrate.