Skip to content

CONCEPT Cited by 1 source

Test pyramid

Definition

A test pyramid (Mike Cohn, popularised by Martin Fowler) is a shape heuristic for a test suite's composition. From widest (foundation) to narrowest (apex):

  1. Unit tests — fastest, most numerous. One module in isolation.
  2. Component / service tests — mid-layer. One service with its dependencies stubbed or faked.
  3. Integration tests — narrower. Exercise the boundary between your code and one real external system (DB, HTTP peer, queue, object store). Pay real-engine / real-wire cost; in return, catch bugs unit tests can't.
  4. System / E2E / manual tests — narrowest. Full stack, expensive to write and run, flaky.

The shape encodes a cost/coverage tradeoff: each layer up is slower, more flaky, and more expensive to maintain, so fewer tests should live there. Bug-detection coverage is inverted — higher-layer tests catch categories lower ones miss (wiring, real-engine behaviour, environment) but at a high cost per test.

Worked Zalando ratio

Zalando Marketing Services uses ≈ 25% integration tests relative to unit tests as a rule of thumb, with the explicit caveat that it "varies from application to application". This sits near the middle of the industry spread (Google's original heuristic was 70 / 20 / 10 for small / medium / large); the specific ratio matters less than the shape. (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers)

Common inversions (anti-pyramid)

  • Ice-cream cone: lots of manual / E2E, few unit. Slow, flaky, expensive; catches surface regressions at enormous cost per bug.
  • Hourglass: lots of unit + lots of E2E, missing middle. Unit tests say the pieces work; E2E say the whole works; but integration-layer bugs (wire format drift, real-engine corner cases) fall through.
  • Square: equal counts at every layer. Usually means the lower layers aren't being invested in.

Why integration tests can't replace unit tests

  • Startup cost. A unit test takes milliseconds; an integration test against a real Postgres container takes seconds. Suite-time budget means IT count has to stay bounded.
  • Failure diagnosis. Unit test failures localise; IT failures implicate the whole wire + dependency path.
  • Flakiness risk compounds. Each real dependency adds a failure mode. Unit tests have near-zero infra flakiness.

Why unit tests can't replace integration tests

  • Stub drift. A unit test that stubs the database cannot detect that the production DB rejects the query plan, that json_agg doesn't do what the stub said, or that a migration broke.
  • Wire-format edge cases. HTTP peer returns 5xx, times out, breaks connection mid-response — only a real server exposes those.
  • Spring context / DI wiring. A unit test doesn't exercise the bean graph; an IT does. Even one contextLoads() IT detects wiring regressions and Flyway migration failures (Zalando call-out).

Seen in

Last updated · 476 distilled / 1,218 read