CONCEPT Cited by 1 source
KPI-driven load ramp-up¶
Definition¶
KPI-driven load ramp-up is a load-test control strategy where the ramp schedule targets a business KPI (orders-per-minute, reservations-per-second, checkouts-per- minute) rather than a technical quantity like concurrent users or requests-per-second, and where the worker count is recomputed every cycle from the actually-observed KPI using closed-loop feedback.
Opposite of fixed users-to-requests ramp-up, which assumes a constant ratio between concurrent users and the KPI of interest.
Why the distinction matters¶
Under load, the ratio of concurrent users to business-KPI throughput is not constant. Latency grows with saturation, retries multiply, queueing stretches tail latency, and real- user behaviour loses linearity with added concurrency. A load-test ramp that schedules users produces an unknown shape on the business-KPI axis.
Concretely: "100 users → 10 orders/minute at full health" may degrade to "500 users → 12 orders/minute" when the system is saturated. A ramp that schedules concurrent-users hits "target users" but never hits "target orders/minute", silently under- loads the system, and ships a false-negative "ready for peak" signal.
The closed-loop algorithm¶
Zalando's Load Test Conductor runs on a 60-second cadence (Source: sources/2021-03-01-zalando-building-an-end-to-end-load-test-automation-system-on-top-of-kubernetes):
- Poll Locust status for current user count + current orders-per-minute.
- From the observed (users, orders-per-minute) tuple, compute the current users-per-order ratio.
- Multiply by target orders-per-minute → derive required total user count.
- Subtract current user count → users to spawn this iteration.
- Compute hatch rate = users-to-spawn / iterations-left.
- Push the new hatch rate + user count to the Locust controller API.
- Sleep 60s. Repeat.
The ratio is re-measured every cycle, so the ramp self-corrects as the system's users-per-order ratio drifts.
Stall detection¶
Two edge cases explicitly handled in Zalando's pseudocode:
- User count is zero. Log "load test is being initialized", set hatch rate to 1, user count to initial value, orders/minute to 0. The loop hasn't spun up yet.
- Orders/minute is zero. Log "load test stalled due to no orders getting generated", set hatch rate to 1, user count to 1. The system is taking load but no business transactions complete — a failure mode the test must report rather than try to force past.
Both are reset-to-safe behaviours: the algorithm avoids compounding a stuck state with more load.
Advantages¶
- Business-aligned signal. The pass/fail signal is "did we sustain 10,000 orders/minute", which is directly the capacity-planning number the business wants.
- Self-correcting under degradation. When latency grows and the users/order ratio worsens, the algorithm spawns additional users to keep KPI on target — which is what the real peak event would do.
- Platform-change resilient. If a deploy changes the users-per-order ratio (different retry logic, different caching), the algorithm adapts without retuning the ramp.
Limitations¶
- Assumes the KPI is measurable in near-real-time. If the KPI signal lags (batched order confirmations, async fulfillment), the feedback loop has the same lag and the ramp overshoots.
- Can push the system past failure in pursuit of target KPI. If the target is unreachable, the algorithm keeps spawning users — producing a high-load saturation test rather than a specific-KPI test. Abort criteria must be layered on top.
- Noisy per-cycle measurements. 60 seconds of data on an organically noisy KPI can drive ramp oscillation. Zalando's 60s interval is the authors' balance between responsiveness and smoothing; not universally correct.
Seen in¶
- sources/2021-03-01-zalando-building-an-end-to-end-load-test-automation-system-on-top-of-kubernetes — the canonical instance. Target: orders-per-minute in Zalando's Payments + checkout flow. 60-second cycle, orchestrated by the Go Load Test Conductor against the Locust controller's API.
Related¶
- patterns/kpi-closed-loop-load-ramp-up — pattern form.
- systems/locust — the traffic generator.
- systems/zalando-load-test-conductor — the algorithm's host.
- concepts/feedback-control-loop-for-rollouts — structural kin: same closed-loop shape at a different altitude (fleet rollouts), same reset-to-safe-on-stall primitive.