SYSTEM

Zalando Load Test Conductor¶

Definition¶

Load Test Conductor is a Go microservice built by Zalando's Payments department to own the full lifecycle of an end-to-end load test: clone production's deployed versions into a test cluster, scale applications to match production's resource allocation across Kubernetes and AWS ECS, drive Locust workers via a KPI-driven closed-loop algorithm for a target orders-per-minute, and scale everything back down + clean up test data after the run. Exposes a declarative, Kubernetes-inspired API so "executing a load test is now just one API call away". Source: .

What makes it distinctive¶

Declarative API. "Our service design was heavily influenced by what Kubernetes popularized for infrastructure management. We wanted our system to be a declarative system." The client describes the desired end state of a load test — target orders-per-minute, ramp-up time, plateau time, applications in scope — not the imperative steps. The conductor reconciles.
Multi-substrate orchestrator. Scales applications in Kubernetes (via the Kubernetes client) and in AWS ECS within a single load-test run. Zalando Payments' landscape spans both substrates — the conductor hides the heterogeneity.
Production-version cloning. The Deployer subcomponent queries Zalando's Continuous Delivery Platform (CDP) via the Kubernetes client to find the exact versions currently deployed in production, then triggers deployments of those versions in the test cluster. This is opt-out (developers can disable to test a feature branch).
KPI-driven ramp-up algorithm. The conductor steers Locust workers on a 60-second cadence against a target orders-per-minute KPI rather than a fixed users→orders ratio. See patterns/kpi-closed-loop-load-ramp-up for the algorithm.
Lifecycle owner, not just a test runner. Five named phases: (1) deploy production versions; (2) scale up to match prod resource allocation; (3) generate load via Locust; (4) scale down to pre-test state (cost saving); (5) clean up test data (delete simulator-generated orders).
Multiple invocation paths. Developers trigger runs manually via the conductor's API; a Kubernetes CronJob triggers scheduled runs. Both paths hit the same API, so the declarative contract is the single entry point.

Internal components¶

Deployer — reads production deployment artifacts from CDP via Kubernetes client; triggers test-cluster deployments; waits for rollout completion.
Scaler — scales Kubernetes deployments + AWS ECS services to target resource / replica counts; reverts after test.
Load generator driver — 60-second loop that polls Locust status, calculates orders-per-minute drift from target, computes new hatch rate + user count, pushes to Locust API.
Cleanup component — deletes simulator-generated orders and other known test data after the run.

Substrate layout at Zalando¶

NodePool A — Locust controller + workers, the Load Test Conductor itself, Hoverfly mocks.
NodePool B — Zalando Payments microservices under test.
AWS ECS — non-Kubernetes components of the Payment platform.

One declarative load-test API call fans out scaling actions across two Kubernetes node pools and an ECS cluster simultaneously.

How applications get mocks¶

The conductor does not implement mock-switching itself. Instead, it depends on Skipper's header-based routing (see patterns/header-routed-mock-vs-real-dependency): every load-test request is header-tagged; Skipper routes the request to either the real dependency or a Hoverfly mock deployed in NodePool A. This lets load tests and other tests share the cluster.

Evaluation¶

Human-in-the-loop: a developer reads Grafana dashboards (latency, throughput, response-code rate) during + after the run. SLO-breach alerts fire automatically during execution. Authors acknowledge "Test results have to be manually evaluated to decide if the outcome is successful or not, which is sufficient for us for the time being."

Operational friction (from the post)¶

CI races. Unrelated production deployments mid-test could cause the service to point at under-resourced pods. Not fully solved; named as an operational hazard.
Parity drift. Cluster node types, databases, and centrally managed event queues (Nakadi) had to be adjusted toward production parity, requiring cross-team alignment.

Seen in¶

— the canonical introduction. ~2 hour load-test duration for the Payment system; CronJob-scheduled + developer-triggered; test cluster framed as "a test environment where we can break things".

systems/locust — the traffic generator the conductor drives.
systems/hoverfly — the mocking layer the conductor's scaler brings up.
systems/skipper-proxy — the ingress layer that carries the header-based mock/real routing.
systems/kubernetes · systems/amazon-ecs — the two deployment substrates orchestrated simultaneously.
systems/grafana — the evaluation surface.
patterns/declarative-load-test-conductor — the generalised pattern this system instantiates.
patterns/kpi-closed-loop-load-ramp-up — the ramp-up algorithm.
patterns/live-load-test-in-production — the sibling discipline; this conductor is the break-things complement.
companies/zalando