Skip to content

PATTERN Cited by 1 source

SBOM-driven dependency bloat audit

Intent

Use the fleet-wide SBOM corpus to discover applications with anomalously heavy dependency footprints — particularly apps that import entire umbrella libraries (e.g. the full AWS SDK) when a much smaller subset of modules would suffice. The audit is framed as "which apps are shipping 10× more bytes than they need to?", answerable by a single SQL query across the SBOM dataset, with the wins cashing out as smaller images, faster builds, lower cold-start times, and reduced attack surface.

This pattern is the fleet-altitude complement to concepts/transitive-dependency-reachability — the Datadog goda reach work operates at the per-binary altitude ("which edge in this binary's import graph drags in 30 MiB of k8s packages?"), the SBOM bloat audit operates at the cross-fleet altitude ("which apps across our fleet import the full AWS SDK instead of just the S3 + DynamoDB modules they actually use?").

Canonical wiki instance (Zalando 2023-04-12)

Zalando discovered AWS SDK over-import via SBOM audit:

"Another insight from analyzing the SBOM data was our usage of the AWS SDK. We noticed that some applications were using the full SDK (200MB+ in Java) instead of its individual modules. Addressing this finding helped reduce build times and lower resulting docker image size significantly." (Source: sources/2023-04-12-zalando-how-software-bill-of-materials-change-the-dependency-game)

The operational wins, quoted directly:

  • Full AWS SDK in Java: "200MB+" — the quantified footprint the audit targeted.
  • Build time: reduced (magnitude unspecified).
  • Container image size: "lower resulting docker image size significantly".
  • Attack surface (implied, not quoted): fewer transitive deps → fewer CVEs to track per app.

The audit query shape

A representative SQL query over the SBOM table:

-- Find apps that import the AWS SDK umbrella,
-- not the modular packages
SELECT app, image_digest, COUNT(*) AS aws_deps
FROM sboms
WHERE component LIKE 'software.amazon.awssdk%'
GROUP BY app, image_digest
HAVING SUM(CASE WHEN component = 'software.amazon.awssdk:aws-sdk-java' THEN 1 ELSE 0 END) > 0
ORDER BY aws_deps DESC;

Or more generally — find outlier apps by total dep count:

SELECT app, COUNT(*) AS dep_count
FROM sboms
WHERE image_digest = (latest per app)
GROUP BY app
ORDER BY dep_count DESC
LIMIT 20;  -- investigate the tail

The query returns a ranked list of remediation candidates; each gets a ticket + a standard fix template ("replace aws-sdk-java with aws-sdk-s3 + aws-sdk-dynamodb" etc.).

Why it works

  • Cross-fleet comparison reveals outliers. Apps that import 200 MB of AWS SDK don't stand out inside their own repo — they look normal to their team. They stand out dramatically in a sorted fleet-wide query.
  • One-time investment, recurring wins. Once the modular-import pattern is established + the fix template is written, subsequent apps benefit automatically via template updates.
  • Cashes out as multiple operational metrics. Build time
  • image size + cold start + attack surface + patch cycle are all correlated with dependency count (concepts/dependency-count-by-language-ecosystem).
  • The SBOM corpus is the enabler. Without the data-lake shape, audits like this reduce to per-repo scripting and don't scale.

Beyond AWS SDK: other umbrella-import patterns

The AWS SDK case is representative of a broader anti-pattern where a single "convenience" umbrella package drags in a heavy transitive graph. Similar cases to watch for in SBOM audits:

  • aws-sdk (Node.js v2 vs v3 modular) — v2 imported all services; v3 is modular.
  • Apache Spark uber-jars — pulling in the full Spark distribution when only core + SQL is used.
  • TensorFlow CPU vs GPU — container-sized differences order of magnitude.
  • Boost C++ libraries — linking all of Boost when only specific sub-libraries are used.
  • Kubernetes client libraries — Datadog's trace-agent case (Datadog 2026-02-18) where a single function dragged in 526 k8s.io/* packages; equivalent cross-fleet pattern at scale would surface via SBOM audit.

Anti-patterns

  • Per-team size-budget policing. Works for individual teams but misses the fleet pattern. Each team's umbrella import looks reasonable until you see every team is doing it.
  • Image-size alarms without dependency-level attribution. "Your image is too big" without "because you imported the full SDK" produces no behaviour change.
  • One-time cleanup. Audit once, don't re-run. New apps created from stale templates re-introduce the bloat. The audit should be recurring (monthly/quarterly) with a dashboard that surfaces the top 20 heaviest apps.

Seen in

Last updated · 501 distilled / 1,218 read