CONCEPT

Uber-jar metadata loss¶

Definition¶

Uber-jar metadata loss is the JVM-specific failure mode where a build process that shades / fat-jars / uber-jars its dependencies — flattening all constituent libraries into a single output archive — destroys the per- library metadata that SBOM scanners rely on to detect individual components.

After shading, a tool like syft sees one application-exec.jar with one set of entries in its root META-INF/, instead of the dozens of nested META-INF/maven/<groupId>/<artifactId>/pom.properties files that normally mark each Maven dependency's presence. The SBOM comes back empty or dramatically undercounted — the app looks clean while actually shipping every library that was flattened into the fat jar.

The mechanism¶

Standard Maven / Gradle builds produce:

application.jar
 └── META-INF/
     ├── MANIFEST.MF
     ├── maven/com.example.myapp/pom.properties
     └── ... (only the app's own metadata)
application-dependencies/
 ├── log4j-core-2.14.0.jar
 │    └── META-INF/maven/org.apache.logging.log4j/log4j-core/pom.properties
 ├── jackson-databind-2.13.0.jar
 │    └── META-INF/maven/com.fasterxml.jackson.core/jackson-databind/pom.properties
 └── ...

The SBOM scanner walks the image, finds each jar under application-dependencies/, reads each pom.properties, and emits one SBOM entry per jar.

A shaded / fat-jar / uber-jar build produces one flat archive:

application-exec.jar
 └── META-INF/
     ├── MANIFEST.MF
     ├── maven/com.example.myapp/pom.properties   ← just the app
     └── (all classes from all deps, merged)
 ├── org/apache/logging/log4j/core/...            ← log4j classes here
 └── com/fasterxml/jackson/databind/...           ← jackson classes here

Depending on the shading plugin's configuration, the pom.properties of the constituent dependencies may be:

Dropped entirely (common default).
Renamed / relocated (e.g. shaded/log4j/META-INF/...) making string-match-based scanners miss them.
Preserved (requires explicit configuration — rare in practice).

Why it matters operationally¶

Zalando names this as the canonical JVM adoption gotcha:

"some SBOMs did not show any java-archive entries, because the team's build process flattened all dependencies into an uber-jar and the required metadata needed for library detection was lost." (Source: )

The app ships the vulnerable log4j-core-2.14.0 bytes — just with no pom.properties to prove it. The fleet- sweep query for "apps containing log4j 2.14" returns zero. The app is invisible to the SBOM corpus. Log4Shell- class events become "we have to manually scan every Java team's build process to know which ones shade" — exactly the problem the SBOM corpus was supposed to eliminate.

Zalando's recommendation: "caution when using SBOM tools and double-checking that the SBOM generation works correctly for all applications." The operational implication is that SBOM adoption at scale requires build-pipeline enforcement that SBOM output is non-empty / has a plausible dependency count for every deploy — an SBOM-validation gate.

Remediation¶

Stop shading where possible. Multi-jar deployments or Spring Boot's nested-jar layout preserve per-dep metadata.
Generate the SBOM pre-shade. Run syft against the full target/ directory's dependency jars before the shading plugin runs, emit the SBOM as a build artifact, ship it alongside the shaded jar.
Configure the shading plugin to preserve metadata — Maven Shade Plugin's <createDependencyReducedPom> and <resource> filters can retain pom.properties under renamed paths. Scanners need configuration to know where to look.
Cross-check binary-level fingerprinting. Tools like grype can optionally match on class-file fingerprints across known-vulnerable libraries — a fallback that partially recovers detection without metadata, at some false-positive cost.

Seen in¶

— canonical wiki instance. Zalando names this as the primary SBOM-adoption failure mode they encountered on the JVM side.

concepts/sbom-software-bill-of-materials — the concept this failure mode attacks.
concepts/container-extracted-sbom — the scan locus that surfaces the failure (source-tree scans can see the dependencies before shading; container scans see only what ships).
systems/syft · systems/grype — the scanners whose detection this breaks.