PATTERN Cited by 1 source
SBOM-driven dependency bloat audit¶
Intent¶
Use the fleet-wide SBOM corpus to discover applications with anomalously heavy dependency footprints — particularly apps that import entire umbrella libraries (e.g. the full AWS SDK) when a much smaller subset of modules would suffice. The audit is framed as "which apps are shipping 10× more bytes than they need to?", answerable by a single SQL query across the SBOM dataset, with the wins cashing out as smaller images, faster builds, lower cold-start times, and reduced attack surface.
This pattern is the fleet-altitude complement to
concepts/transitive-dependency-reachability — the Datadog
goda reach work operates at the per-binary altitude
("which edge in this binary's import graph drags in 30 MiB of
k8s packages?"), the SBOM bloat audit operates at the
cross-fleet altitude ("which apps across our fleet import
the full AWS SDK instead of just the S3 + DynamoDB modules
they actually use?").
Canonical wiki instance (Zalando 2023-04-12)¶
Zalando discovered AWS SDK over-import via SBOM audit:
"Another insight from analyzing the SBOM data was our usage of the AWS SDK. We noticed that some applications were using the full SDK (200MB+ in Java) instead of its individual modules. Addressing this finding helped reduce build times and lower resulting docker image size significantly." (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game)
The operational wins, quoted directly:
- Full AWS SDK in Java: "200MB+" — the quantified footprint the audit targeted.
- Build time: reduced (magnitude unspecified).
- Container image size: "lower resulting docker image size significantly".
- Attack surface (implied, not quoted): fewer transitive deps → fewer CVEs to track per app.
The audit query shape¶
A representative SQL query over the SBOM table:
-- Find apps that import the AWS SDK umbrella,
-- not the modular packages
SELECT app, image_digest, COUNT(*) AS aws_deps
FROM sboms
WHERE component LIKE 'software.amazon.awssdk%'
GROUP BY app, image_digest
HAVING SUM(CASE WHEN component = 'software.amazon.awssdk:aws-sdk-java' THEN 1 ELSE 0 END) > 0
ORDER BY aws_deps DESC;
Or more generally — find outlier apps by total dep count:
SELECT app, COUNT(*) AS dep_count
FROM sboms
WHERE image_digest = (latest per app)
GROUP BY app
ORDER BY dep_count DESC
LIMIT 20; -- investigate the tail
The query returns a ranked list of remediation candidates;
each gets a ticket + a standard fix template
("replace aws-sdk-java with aws-sdk-s3 +
aws-sdk-dynamodb" etc.).
Why it works¶
- Cross-fleet comparison reveals outliers. Apps that import 200 MB of AWS SDK don't stand out inside their own repo — they look normal to their team. They stand out dramatically in a sorted fleet-wide query.
- One-time investment, recurring wins. Once the modular-import pattern is established + the fix template is written, subsequent apps benefit automatically via template updates.
- Cashes out as multiple operational metrics. Build time
- image size + cold start + attack surface + patch cycle are all correlated with dependency count (concepts/dependency-count-by-language-ecosystem).
- The SBOM corpus is the enabler. Without the data-lake shape, audits like this reduce to per-repo scripting and don't scale.
Beyond AWS SDK: other umbrella-import patterns¶
The AWS SDK case is representative of a broader anti-pattern where a single "convenience" umbrella package drags in a heavy transitive graph. Similar cases to watch for in SBOM audits:
aws-sdk(Node.js v2 vs v3 modular) — v2 imported all services; v3 is modular.- Apache Spark uber-jars — pulling in the full Spark distribution when only core + SQL is used.
- TensorFlow CPU vs GPU — container-sized differences order of magnitude.
- Boost C++ libraries — linking all of Boost when only specific sub-libraries are used.
- Kubernetes client libraries — Datadog's
trace-agentcase (Datadog 2026-02-18) where a single function dragged in 526k8s.io/*packages; equivalent cross-fleet pattern at scale would surface via SBOM audit.
Anti-patterns¶
- Per-team size-budget policing. Works for individual teams but misses the fleet pattern. Each team's umbrella import looks reasonable until you see every team is doing it.
- Image-size alarms without dependency-level attribution. "Your image is too big" without "because you imported the full SDK" produces no behaviour change.
- One-time cleanup. Audit once, don't re-run. New apps created from stale templates re-introduce the bloat. The audit should be recurring (monthly/quarterly) with a dashboard that surfaces the top 20 heaviest apps.
Integration with related patterns¶
- patterns/sbom-as-queryable-data-lake-asset — the foundation. Without fleet-wide SBOM corpus, this audit is infeasible at scale.
- concepts/transitive-dependency-reachability — the
per-binary complement. Use SBOM audit to find which apps
are bloated; use language-native reachability tools
(
goda reach,webpack-bundle-analyzer,maven dependency:tree) to find which edges to cut inside each app. - patterns/single-function-forced-package-split — the Datadog-specific surgical fix for one heavy dep edge.
- patterns/template-project-nudges-consistency — make the fix stick by updating the template new apps copy.
Seen in¶
- sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game — canonical wiki instance. AWS SDK full-vs-modular audit at Zalando. Build-time + image-size wins quantified only as "significantly"; the 200 MB+ umbrella footprint is the quantified input.
Related¶
- patterns/sbom-as-queryable-data-lake-asset — the foundation this pattern runs on.
- concepts/sbom-software-bill-of-materials — the artifact.
- concepts/dependency-count-by-language-ecosystem — the empirical distribution that motivates outlier-hunting.
- concepts/transitive-dependency-reachability — the per-binary complement.
- patterns/single-function-forced-package-split — the surgical fix when an audit surfaces a one-edge-heavy- subgraph case.