PATTERN Cited by 2 sources

Load-test at scale (before real workloads)¶

Load-test at scale is the practice of running a synthetic workload on a new platform sized to match the largest real workloads you plan to host there, before those real workloads are migrated in. The goal is to force-discover the platform-sizing problems that only manifest at production cardinality.

The failure mode it prevents¶

New-platform load tests commonly run "enough to exercise the code path," not "enough to exercise the control-plane at real fan-out." The result: the migration's first production workload becomes the load test, and scaling bugs blow up with a customer on the other end.

Typical scale-only failure modes:

Control-plane-component sizing (API server, controller, policy engine, cluster-DNS) is too small and slows down pod scheduling / startup.
Metric-cardinality explosion surfaces only at N_services × N_pods.
EDS / xDS push volume overwhelms clients under real topology size.
Networking limits (conntrack, ephemeral-port-exhaustion) trip at production QPS.
Scheduler scoring becomes a bottleneck at N_nodes × N_pods.

None of these show up in a 10-pod test.

The move¶

Define a "Hello, World" service — minimal logic, just enough to exercise the end-to-end platform path (deploy → schedule → health → serve → observe).
Scale it up to the pod count of your biggest real service, not to the test-cluster's node capacity.
Observe what breaks or slows down — each one is a platform tuning issue to fix before onboarding real workloads.

Figma's instantiation¶

Scaled a Hello-World service to the same pod count as their largest services pre-migration. Outcome: had to tune the size and scale of core compute services that support the platform. One named example: Kyverno (cluster security assertions). If Kyverno is undersized, new-pod startup slows because every admission check passes through it.

Without this load test, the discovery would have happened when Figma's first real service migrated in — and the slow pod startup would have manifested as service-degradation symptoms rather than a cleanly-attributable platform-layer issue.

Contrast with shadow migration¶

patterns/shadow-migration runs real production inputs through the new system alongside the old to validate correctness-at-scale for data workloads.
Load-test-at-scale uses synthetic workloads to validate control-plane and orchestration behavior for compute workloads. The shape of the workload doesn't matter; the cardinality does.

Complementary: large-scale platforms often use both at different phases.

Figma's other note: "We even migrated one of our services over before we had finished building the staging environment, and it turned out to be well worth it; it quickly derisked the end to end ability to effectively run workloads and helped us identify bottlenecks and bugs." Combined with the Hello-World test, this is a real data over staged data principle at the migration-validation tier.

Seen in¶

sources/2024-08-08-figma-migrated-onto-k8s-in-less-than-12-months — Hello-World scaled to largest-service-pod-count; Kyverno sizing regression surfaced this way.
sources/2024-10-28-dropbox-robinhood-in-house-load-balancing — Dropbox load-tested the Robinhood PID-control LB at production fanout scale before rolling it out; similar "platform-sizing first, tenants second" ordering.

patterns/scoped-migration-with-fast-follows — load-testing is one of the migration-execution disciplines this pattern depends on
patterns/shadow-migration — correctness-validation counterpart