Skip to content

CONCEPT Cited by 1 source

Parallel-run request doubling

Definition

Parallel-run request doubling is the operational cost envelope of running a parallel run: every incoming request is processed twice — once by the authoritative implementation and once by the candidate implementation — plus the overhead of the comparison itself. Zalando's Returns post frames this as "the load across all components increases, potentially doubling."

Mechanism

  • Request arrives at the monolith (or equivalent old implementation) → processed normally.
  • Monolith fires async comparison request to the new service (see patterns/async-consistency-checker-sidecar).
  • New service re-issues the same request against its own real endpoint → processed a second time.
  • Response comparison consumes additional CPU / memory to run the diff, emit metrics, and potentially store comparison payloads for offline inconsistency investigation.
  • Downstream dependencies are also hit twice if the endpoint calls other services — so the doubling propagates through the call graph.

The multiplier is potential rather than strict 2× because:

  • Monolith may short-circuit some paths (caching, auth) that the new service re-runs from scratch — pushing the new service above 1×.
  • Some requests fail fast in one system and don't exercise the full stack in the other — pushing below 2×.
  • Comparison-only load (Prometheus scraping, Grafana queries, comparison-payload storage) adds a sub-1× tail.

Why it matters

  • Capacity planning. Provisioning the new service at 1× of monolith traffic silently overruns; 2× is the planning number.
  • Downstream blast radius. A Returns service fanning out to payments, inventory, and user-lookup services doubles load on each — downstream teams must be in the loop.
  • Compute cost attribution. The parallel-run bill is a real migration cost, not free instrumentation.
  • Noisy-neighbour risk. Monolith + new service + comparator sharing infrastructure (database replicas, caches, Kafka topics, observability backends) can trigger unexpected contention.
  • Sets the duration budget. Because parallel-run cost is high, there's pressure to reach the readiness threshold faster — which in practice means endpoints don't sit in parallel run indefinitely.

Zalando framing

"Given that requests received by the monolith are forwarded to the microservice, the load across all components increases, potentially doubling." (Source: sources/2021-11-03-zalando-parallel-run-pattern-a-migration-technique-in-microservices)

Treated as a known-and-expected operational expense, not a surprise. Zalando does not quote absolute RPS numbers.

Mitigations

  • Sample, don't duplicate 100%. Compare a fraction of requests (e.g., 10%) — lowers the cost at the price of slower readiness signal. Zalando's post doesn't document sampling; they appear to have run the full 2× envelope.
  • Async comparison off hot path. Preserves client latency but doesn't reduce the compute cost, just moves it.
  • Per-endpoint phased entry into parallel run. Don't enable parallel run on all endpoints at once — align with the per-endpoint cutover discipline.
  • Turn off parallel run on an endpoint once its threshold is met. Continue-running parallel run after cutover wastes compute; stop once readiness is confirmed.

Seen in

Last updated · 550 distilled / 1,221 read