Skip to content

CONCEPT Cited by 1 source

Production-version cloning for load test

Definition

Production-version cloning for load test is the practice of programmatically determining the exact versions currently deployed in production for every application in scope of a load test, deploying those versions into a pre-production test cluster, and scaling them to match production's resource allocation and replica count — all as part of the load test's setup phase, automated, and run every time the test runs.

The alternative — manually tracking which version each service is running and deploying them by hand — is the default in most orgs and is the reason pre-prod load tests routinely fail to catch version-skew regressions.

Why it's non-trivial

A modern microservices platform has three characteristics that make version-parity hard:

  1. Independently deploying services. Every service has its own deploy cadence. A snapshot of production's version set at 10:00 is stale by 11:00.
  2. Multi-substrate heterogeneity. Kubernetes deployments have one manifestation of "what's running"; AWS ECS services have another; both must be queried.
  3. Resource config drift. Replica count and CPU/memory limits get tuned independently of version. A test cluster running the right version at half the resources is a different system.

Without automation, the test cluster drifts from production continuously, and the load test becomes a test of a fiction rather than of the production-bound system.

Zalando's Deployer + Scaler split

The Zalando Load Test Conductor (Source: sources/2021-03-01-zalando-building-an-end-to-end-load-test-automation-system-on-top-of-kubernetes) splits this into two named components:

  • Deployer: queries the Continuous Delivery Platform via the Kubernetes client to find production's current deployed version of each in-scope application; triggers a test-cluster deployment of that version; waits for rollout completion.
  • Scaler: reads production's resource allocation (replica count, CPU/memory) for each application; scales test-cluster deployments to match, supporting both Kubernetes and AWS ECS; after the test, reverts to the pre-test state as a cost saving measure.

The feature-branch exception

The Deployer step is optional, per test run: "We made the deployment of the production versions of the applications an optional feature, so that developers can test their feature branch code." This lets the same automation serve two different test intents:

  • Cyber-Week-prep mode: all services run production's current versions → tests production-bound behaviour.
  • Feature-branch mode: the developer's service runs their branch, all dependencies run production's versions → tests the branch's behaviour under production-shape load.

The resource-config parity invariant

"applications in load test environment is updated to match resource allocation, number of instances and application version of the production environment". This is the full invariant: version + replica count + CPU/memory. Any one of these off and the load-test results are suspect.

The authors' conclusion names the gap honestly:

"Several infrastructure components like cluster node type, databases, centrally managed event queues ( Nakadi) had to be adjusted for similarity with the production environment. This required a lot of communication effort and alignment with teams managing the services."

Version + replica + resource-limit parity is the application layer of the invariant; node type + databases + shared event bus is the infrastructure layer, and the infrastructure layer requires cross-team negotiation the conductor cannot automate.

Trade-offs

  • Cost. Scaling to production replica count in a pre-prod cluster for 2 hours is expensive. Scale-down after the test is the mitigation; the Scaler handles it explicitly.
  • Concurrent-deploy race. Unrelated production deployments may land during a load-test window, mid-flight; the conductor's Scaler may already have captured the pre-deploy version and will not re-capture. Zalando calls this out as ongoing friction.
  • Stateful dependency parity is harder than stateless. The Deployer + Scaler handle stateless apps; stateful-infrastructure parity (database sizing, cache warmth, queue depth) remains manual.

Seen in

Last updated · 476 distilled / 1,218 read