PATTERN

SBOM-driven dependency bloat audit¶

Intent¶

Use the fleet-wide SBOM corpus to discover applications with anomalously heavy dependency footprints — particularly apps that import entire umbrella libraries (e.g. the full AWS SDK) when a much smaller subset of modules would suffice. The audit is framed as "which apps are shipping 10× more bytes than they need to?", answerable by a single SQL query across the SBOM dataset, with the wins cashing out as smaller images, faster builds, lower cold-start times, and reduced attack surface.

This pattern is the fleet-altitude complement to concepts/transitive-dependency-reachability — the Datadog goda reach work operates at the per-binary altitude ("which edge in this binary's import graph drags in 30 MiB of k8s packages?"), the SBOM bloat audit operates at the cross-fleet altitude ("which apps across our fleet import the full AWS SDK instead of just the S3 + DynamoDB modules they actually use?").

Canonical wiki instance (Zalando 2023-04-12)¶

Zalando discovered AWS SDK over-import via SBOM audit:

"Another insight from analyzing the SBOM data was our usage of the AWS SDK. We noticed that some applications were using the full SDK (200MB+ in Java) instead of its individual modules. Addressing this finding helped reduce build times and lower resulting docker image size significantly." (Source: )

The operational wins, quoted directly:

Full AWS SDK in Java: "200MB+" — the quantified footprint the audit targeted.
Build time: reduced (magnitude unspecified).
Container image size: "lower resulting docker image size significantly".
Attack surface (implied, not quoted): fewer transitive deps → fewer CVEs to track per app.

The audit query shape¶

A representative SQL query over the SBOM table:

-- Find apps that import the AWS SDK umbrella,
-- not the modular packages
SELECT app, image_digest, COUNT(*) AS aws_deps
FROM sboms
WHERE component LIKE 'software.amazon.awssdk%'
GROUP BY app, image_digest
HAVING SUM(CASE WHEN component = 'software.amazon.awssdk:aws-sdk-java' THEN 1 ELSE 0 END) > 0
ORDER BY aws_deps DESC;

Or more generally — find outlier apps by total dep count:

SELECT app, COUNT(*) AS dep_count
FROM sboms
WHERE image_digest = (latest per app)
GROUP BY app
ORDER BY dep_count DESC
LIMIT 20;  -- investigate the tail

The query returns a ranked list of remediation candidates; each gets a ticket + a standard fix template ("replace aws-sdk-java with aws-sdk-s3 + aws-sdk-dynamodb" etc.).

Why it works¶

Cross-fleet comparison reveals outliers. Apps that import 200 MB of AWS SDK don't stand out inside their own repo — they look normal to their team. They stand out dramatically in a sorted fleet-wide query.
One-time investment, recurring wins. Once the modular-import pattern is established + the fix template is written, subsequent apps benefit automatically via template updates.
Cashes out as multiple operational metrics. Build time
image size + cold start + attack surface + patch cycle are all correlated with dependency count (concepts/dependency-count-by-language-ecosystem).
The SBOM corpus is the enabler. Without the data-lake shape, audits like this reduce to per-repo scripting and don't scale.

Beyond AWS SDK: other umbrella-import patterns¶

The AWS SDK case is representative of a broader anti-pattern where a single "convenience" umbrella package drags in a heavy transitive graph. Similar cases to watch for in SBOM audits:

aws-sdk (Node.js v2 vs v3 modular) — v2 imported all services; v3 is modular.
Apache Spark uber-jars — pulling in the full Spark distribution when only core + SQL is used.
TensorFlow CPU vs GPU — container-sized differences order of magnitude.
Boost C++ libraries — linking all of Boost when only specific sub-libraries are used.
Kubernetes client libraries — Datadog's trace-agent case (Datadog 2026-02-18) where a single function dragged in 526 k8s.io/* packages; equivalent cross-fleet pattern at scale would surface via SBOM audit.

Anti-patterns¶

Per-team size-budget policing. Works for individual teams but misses the fleet pattern. Each team's umbrella import looks reasonable until you see every team is doing it.
Image-size alarms without dependency-level attribution. "Your image is too big" without "because you imported the full SDK" produces no behaviour change.
One-time cleanup. Audit once, don't re-run. New apps created from stale templates re-introduce the bloat. The audit should be recurring (monthly/quarterly) with a dashboard that surfaces the top 20 heaviest apps.

patterns/sbom-as-queryable-data-lake-asset — the foundation. Without fleet-wide SBOM corpus, this audit is infeasible at scale.
concepts/transitive-dependency-reachability — the per-binary complement. Use SBOM audit to find which apps are bloated; use language-native reachability tools (goda reach, webpack-bundle-analyzer, maven dependency:tree) to find which edges to cut inside each app.
patterns/single-function-forced-package-split — the Datadog-specific surgical fix for one heavy dep edge.
patterns/template-project-nudges-consistency — make the fix stick by updating the template new apps copy.

Seen in¶

— canonical wiki instance. AWS SDK full-vs-modular audit at Zalando. Build-time + image-size wins quantified only as "significantly"; the 200 MB+ umbrella footprint is the quantified input.

patterns/sbom-as-queryable-data-lake-asset — the foundation this pattern runs on.
concepts/sbom-software-bill-of-materials — the artifact.
concepts/dependency-count-by-language-ecosystem — the empirical distribution that motivates outlier-hunting.
concepts/transitive-dependency-reachability — the per-binary complement.
patterns/single-function-forced-package-split — the surgical fix when an audit surfaces a one-edge-heavy- subgraph case.