CONCEPT

Test pyramid¶

Definition¶

A test pyramid (Mike Cohn, popularised by Martin Fowler) is a shape heuristic for a test suite's composition. From widest (foundation) to narrowest (apex):

Unit tests — fastest, most numerous. One module in isolation.
Component / service tests — mid-layer. One service with its dependencies stubbed or faked.
Integration tests — narrower. Exercise the boundary between your code and one real external system (DB, HTTP peer, queue, object store). Pay real-engine / real-wire cost; in return, catch bugs unit tests can't.
System / E2E / manual tests — narrowest. Full stack, expensive to write and run, flaky.

The shape encodes a cost/coverage tradeoff: each layer up is slower, more flaky, and more expensive to maintain, so fewer tests should live there. Bug-detection coverage is inverted — higher-layer tests catch categories lower ones miss (wiring, real-engine behaviour, environment) but at a high cost per test.

Worked Zalando ratio¶

Zalando Marketing Services uses ≈ 25% integration tests relative to unit tests as a rule of thumb, with the explicit caveat that it "varies from application to application". This sits near the middle of the industry spread (Google's original heuristic was 70 / 20 / 10 for small / medium / large); the specific ratio matters less than the shape. (Source: )

Common inversions (anti-pyramid)¶

Ice-cream cone: lots of manual / E2E, few unit. Slow, flaky, expensive; catches surface regressions at enormous cost per bug.
Hourglass: lots of unit + lots of E2E, missing middle. Unit tests say the pieces work; E2E say the whole works; but integration-layer bugs (wire format drift, real-engine corner cases) fall through.
Square: equal counts at every layer. Usually means the lower layers aren't being invested in.

Why integration tests can't replace unit tests¶

Startup cost. A unit test takes milliseconds; an integration test against a real Postgres container takes seconds. Suite-time budget means IT count has to stay bounded.
Failure diagnosis. Unit test failures localise; IT failures implicate the whole wire + dependency path.
Flakiness risk compounds. Each real dependency adds a failure mode. Unit tests have near-zero infra flakiness.

Why unit tests can't replace integration tests¶

Stub drift. A unit test that stubs the database cannot detect that the production DB rejects the query plan, that json_agg doesn't do what the stub said, or that a migration broke.
Wire-format edge cases. HTTP peer returns 5xx, times out, breaks connection mid-response — only a real server exposes those.
Spring context / DI wiring. A unit test doesn't exercise the bean graph; an IT does. Even one contextLoads() IT detects wiring regressions and Flyway migration failures (Zalando call-out).

Seen in¶

— Zalando ZMS anchors its testing discipline on Fowler's pyramid; uses ~25% IT to unit as the team heuristic. Canonicalises the structural role of the pyramid in justifying Testcontainers investment.

concepts/first-test-principles — the FIRST properties every layer still has to satisfy.
concepts/automated-vs-manual-testing-complementarity — the manual-tests layer's residual role.
patterns/property-based-testing — a technique that thickens the unit layer with more coverage per test.
patterns/real-docker-container-over-in-memory-fake — the integration-layer implementation choice.
patterns/failsafe-integration-test-separation — Maven plumbing that runs the two layers in different phases.