CONCEPT

Test cluster as break-things environment¶

Definition¶

A break-things test cluster is a pre-production environment that is deliberately scaled, versioned, and configured to match production, but whose explicit purpose is to let engineers push load until the system fails — a discipline that is unsafe in production because it would degrade real customers.

Zalando's framing, verbatim (Source: ):

"In order to really push our services to the edge, we wanted to run the load testing system in our test cluster, as this enables us to break things when necessary without causing customer impact."

The key property: discovery past failure¶

This concept answers a specific question that live-load-test-in- production cannot answer:

"What happens to the system when load exceeds capacity?"

Live-prod load testing's governing constraint is abort on customer impact. You cannot observe post-saturation failure modes — retry storms, queue overflow behaviour, degraded- dependency fallbacks — because the abort fires first.

A break-things cluster has no abort constraint (no real customers), so the operator can:

Observe the first bottleneck, fix it, and re-run.
Run multiple saturation experiments per week.
Validate fail-safe behaviour (circuit breakers tripping, bulkheads isolating, degraded-mode fallbacks engaging).
Measure unsafe numerics (memory growth at 2x load, disk-spill behaviour, thread-pool exhaustion) that customer-facing monitoring would suppress.

The required invariant: production parity¶

A break-things cluster only produces transferable insight if it is close enough to production that failure modes reproduce. Parity is expensive:

Application parity — versions + replica count + CPU/mem limits per service. Automatable (see concepts/production-version-cloning-for-load-test).
Infrastructure parity — node types, databases, shared event buses (Zalando names Nakadi). Requires cross-team coordination.
Dependency parity — external third-parties. Impossible to have real, so mocked. See Hoverfly and patterns/header-routed-mock-vs-real-dependency.

Zalando's post is explicit that the parity effort is ongoing and imperfect: "Several infrastructure components like cluster node type, databases, centrally managed event queues had to be adjusted for similarity with the production environment. This required a lot of communication effort and alignment with teams managing the services."

Complementary, not replacement¶

Zalando's conclusion is explicit:

"Since these load tests are conducted in a non-production environment, we could stress the services till they fail. In combination with load tests in production, this was essential for preparing our production services for higher load."

The two disciplines answer different questions:

	Break-things test cluster	Live-prod load test
Question	Where does it fail?	What's today's sustainable capacity?
Abort constraint	None	Customer impact
Fidelity	Good (parity cost)	Perfect (it's prod)
Run frequency	High (no prod risk)	Low (every run costs)
Output	Bottleneck list + fail modes	Confidence capacity number

A mature org runs both. The break-things cluster explores; live-prod load tests verify.

Anti-patterns¶

Scaling the test cluster smaller than prod — failure modes don't reproduce, and the test becomes a test of the test cluster, not of production.
Skipping infrastructure-layer parity — the application looks fine until the shared database or event bus becomes the bottleneck, and the test cluster's undersized version bottlenecks earlier than prod would, producing a false "ready for peak" negative.
Reading break-things results as capacity numbers — the purpose is failure-mode discovery, not peak-minute commitment. Those come from live-prod load tests.
Treating the cluster as a permanent staging environment — it's a load-test substrate; using it for routine QA dilutes its break-things charter.

Seen in¶

— canonical instance: Zalando Payments' test cluster, the Load Test Conductor's target environment; explicitly framed as the sibling discipline to live-prod load testing for Cyber Week preparation.

patterns/live-load-test-in-production — the complementary discipline in production.
patterns/declarative-load-test-conductor — the automation pattern that makes break-things runs cheap enough to do often.
concepts/production-version-cloning-for-load-test — the parity invariant.