PATTERN Cited by 1 source
Declarative load-test conductor¶
What this is¶
Declarative load-test conductor is the pattern of building a dedicated, long-lived microservice that owns the complete lifecycle of a load test — deploy production versions, scale applications, drive the load generator, scale back down, clean up — and exposes that capability via a single declarative API where the client describes the target state of a load test (target KPI, ramp-up, plateau duration, apps in scope) rather than the imperative steps.
The pattern generalises Zalando's Load Test Conductor (Source: sources/2021-03-01-zalando-building-an-end-to-end-load-test-automation-system-on-top-of-kubernetes).
The canonical shape¶
The conductor owns five named phases, executed per load-test run:
- Deploy production versions into the test cluster. Use the CI/CD platform's API to find the currently-deployed production artifacts, trigger test-cluster deployments, wait for rollout. (See concepts/production-version-cloning-for-load-test.)
- Scale applications to production's replica count + resource allocation. Support multiple substrates simultaneously (Zalando: Kubernetes + AWS ECS).
- Generate load via a distributed traffic tool, steered by a KPI-driven closed-loop algorithm against a business KPI target.
- Scale back down to the pre-test state as a cost mitigation.
- Clean up test data (delete simulator-generated orders, payments, audit records, Nakadi events).
The declarative contract¶
Client's request:
targetOrdersPerMinute: <N>
rampUpMinutes: <N>
plateauMinutes: <N>
applications: [<svc-a>, <svc-b>, ...]
useProductionVersions: true | false # feature-branch exception
The conductor figures out the rest.
Required subsystems¶
- Deployer — CI/CD platform client, version-discovery, test-cluster deployment driver.
- Scaler — multi-substrate scaler (Kubernetes API + AWS ECS API + ...). Captures pre-test state to revert.
- Load generator driver — polls the traffic tool's API; runs the KPI closed-loop algorithm; pushes hatch-rate / user-count updates.
- Cleanup — knows which downstream state the simulator creates and how to delete it.
- API + scheduler — single declarative endpoint; also accepts a Kubernetes CronJob trigger hitting the same endpoint.
Why a microservice, not a script¶
- Lifecycle state has to persist across retries. Scale-up phase of one run can outlive a single-shot Jenkins job.
- Multi-substrate orchestration is stateful. Reverting an ECS service's desired count requires remembering what it was before the test.
- Concurrent-run guarding. A long-running service can prevent two load tests colliding; a script cannot.
- API surface versus CLI surface. Developers + a CronJob + a Slackbot + CI all want the same capability; an HTTP API serves all of them uniformly.
- Observability ownership. The conductor becomes the authoritative record of what happened on each run.
The invocation paths (all hit the same API)¶
- Manual developer trigger — curl / UI button, typically
with
useProductionVersions: falseagainst a feature branch. - Kubernetes CronJob — for scheduled regression runs (see patterns/scheduled-cron-triggered-load-test).
- Pre-release gate — CI step before production rollout, also via the same API.
Relation to other patterns¶
- patterns/live-load-test-in-production — the in-production sibling discipline. A declarative conductor for pre-prod complements, not replaces, in-prod load testing. Zalando runs both; the pre-prod cluster is the break-things environment.
- patterns/kpi-closed-loop-load-ramp-up — how the conductor drives the load generator during the Generate load phase.
- patterns/mock-external-dependencies-for-isolated-load-test — how the conductor's scale-up phase brings up the mocking layer next to the services under test.
- patterns/scheduled-cron-triggered-load-test — how recurring runs are initiated against the same API.
When to apply¶
- Microservices landscape of meaningful size (Zalando: 1,122 in-scope apps out of 4,000+ total). The parity + orchestration burden is too high for scripts.
- Multiple deployment substrates that must be scaled in lockstep (Zalando: Kubernetes + AWS ECS).
- A recurring forcing function like Cyber Week that funds the investment. See patterns/annual-peak-event-as-capability-forcing-function.
When not to¶
- Monolith or small microservice footprint. Build-vs-buy favours a simpler scripted harness.
- No pre-production environment with production parity. The conductor only adds value if the cluster it drives is close enough to prod that results transfer.
- No business KPI that can anchor the load shape. The closed-loop ramp-up is what gives this pattern teeth; without a target KPI, simpler ramp schedules suffice.
Operational friction (honest)¶
- Unrelated production deploys during a test can race with the Scaler's version snapshot. Zalando names this as unsolved in the post.
- Infrastructure-parity work is manual — the conductor handles application layer; databases, node types, shared event buses require cross-team negotiation.
- Evaluation is often manual — Zalando notes the pass/fail call is read by a human from Grafana.
Seen in¶
- sources/2021-03-01-zalando-building-an-end-to-end-load-test-automation-system-on-top-of-kubernetes — canonical instance: Zalando Payments department's Load Test Conductor as the Cyber-Week-prep automation for the payment + checkout microservice landscape.
Related¶
- concepts/declarative-load-test-api — the API-design underpinning.
- systems/zalando-load-test-conductor — canonical implementation.
- patterns/kpi-closed-loop-load-ramp-up — the ramp-up mechanism.
- patterns/live-load-test-in-production — the sibling discipline in production.
- companies/zalando