PATTERN Cited by 1 source
Side-by-side runtime validation¶
Side-by-side runtime validation is the rollout-risk discipline for replacing a foundational runtime component in a client or server where behavioral differences (execution order, invalidation granularity, caching, scheduling) could silently diverge from the old runtime in edge cases not covered by unit tests.
The pattern: run both the old and new runtimes in parallel against real production workloads, for months, comparing data models and rendered output for correctness while tracking performance metrics, and flip to the new runtime only when both axes match.
Canonical source: sources/2026-04-21-figma-rebuilt-foundations-of-component-instances:
"We relied on extensive side-by-side validation. For months, we ran the old and new runtimes in parallel across hundreds of thousands of real production files, comparing both data models and rendered output to ensure designs looked and behaved identically.
Using the same validation process, we also collected metrics on performance and reasoned through performance tradeoffs. […] By tracking these metrics alongside correctness, we were able to identify regressions early and land targeted optimizations before rollout.
Only once the systems matched across correctness and performance did we begin gradually shipping the new architecture to production."
Shape¶
┌───────────────────────────┐
│ Real production input │
│ (production file / event) │
└──────────┬────────────────┘
├──────────┬──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌────────────┐
│ Old │ │ New │ │ Diff + │
│ runtime │ │ runtime │ │ perf │
└────┬─────┘ └────┬─────┘ │ collector │
│ │ └────────────┘
└──AUTHORITY─┘
Key properties:
- Real inputs, not synthetic. Hundreds of thousands of production files (Figma: the actual user corpus). Synthetic tests miss real-world edge cases — that was the whole reason side-by-side is needed.
- Parallel execution, one authority. The old runtime stays authoritative; the new runtime runs alongside but doesn't emit user-visible output. Diffs go to a diagnostic stream.
- Dual comparison axis. Both correctness (data model + rendered output) and performance (latency, memory, caches). Figma explicitly names both.
- Long duration. "For months" — edge cases in behavioral differences surface at tail frequencies; they need real traffic time to accumulate.
- Rollout gated on matching on both axes. Not just "did it render right" but also "did it stay within the perf budget." This is the stricter contract.
Why a stronger form than shadow validation¶
Side-by-side runtime validation is a stronger variant of the shadow-validation pattern. The differences:
| Axis | Shadow validation | Side-by-side runtime validation |
|---|---|---|
| What's running | One authoritative path + a prediction | Two full runtimes, parallel |
| Failure signal | Shadow says "won't see this," real path does | Direct diff of two outputs |
| Performance | Not typically tracked | Co-equal to correctness |
| Rollout gate | Correctness only | Correctness AND performance |
| Context | Augmented data structure (e.g. write-dep graph) | Complete runtime swap |
The step up is needed when the new runtime's failure mode isn't "will this data structure miss a case" but "will the overall behavior — layout order, invalidation cascade, caching — differ in ways users notice." That's exactly the Materializer → Instance Updater swap: the same outputs should emerge, but through a different execution shape.
When to use¶
- Foundational runtime swap — replacing a core subsystem many others depend on (layout engine, query planner, reactive framework, network layer).
- Silent divergence is plausible — the failure class is subtle behavioral difference, not a crash or exception.
- Real-world input is the test oracle — unit tests can't enumerate the edge cases; production files / traffic can.
- You have the engineering budget — running two runtimes in parallel is a non-trivial cost, both in CPU and in the machinery to diff their outputs.
When not to use¶
- Pure-function replacements — if the new code's output is fully determined by inputs, hash-equality tests on a representative corpus are cheaper.
- Trivially-isolated components — feature-flag A/B on a small percentage is lower-cost and surfaces most failures.
- Data-model changes — if the two runtimes write different data models, diffing them requires projection anyway; usually a different migration pattern (patterns/shadow-migration).
Adjacent patterns¶
- patterns/shadow-migration — dual-run both engines against the same inputs and reconcile statistically. Coarser; used for data-pipeline engine swaps (Amazon BDT Spark → Ray).
- patterns/shadow-validation-dependency-graph — shadow a derived data structure against the authoritative path. One-authority variant; used for Figma QueryGraph write-dep validation.
- patterns/pilot-component-language-migration — prove a new language on the smallest isolated component first with real perf data. Different scope (language, not runtime), same de-risking logic.
- patterns/ab-test-rollout — traffic-split rollout for user-facing changes. Can be combined with side-by-side validation as the final step after parallel validation clears.
Cost¶
- Parallel CPU / memory. Running two runtimes per input doubles resource cost during the validation window.
- Diff infrastructure. Data-model comparison, rendered-output comparison, performance metric collection must be built.
- Triage overhead. Real diffs surface; each one needs root-cause analysis before rollout can proceed. "Months" is partly this.
- Delayed shipping. The validation window is a multi-month gate between feature-complete and user-visible.
Seen in¶
- sources/2026-04-21-figma-rebuilt-foundations-of-component-instances — canonical instance. Materializer rolled out behind months of side-by-side against the production file corpus, gated on matched-correctness AND matched-perf before staged rollout.
Related¶
- patterns/shadow-validation-dependency-graph — weaker sibling pattern; single-authority shadow of a derived data structure.
- patterns/shadow-migration — data-pipeline engine swap sibling.
- systems/figma-materializer — the system rolled out behind this discipline.