Skip to content

CONCEPT Cited by 1 source

Shadow cluster

Definition

A shadow cluster is a parallel cluster running a candidate release of a system, which receives a mirror of production traffic so that its behaviour can be compared against the current production release before the candidate is promoted.

It is a sibling of the canary: canary gets a slice of real prod traffic, shadow gets a copy of all prod traffic but does not return results to users. The failure modes each catches are different:

  • Canary catches regressions that appear quickly at any traffic share.
  • Shadow catches regressions that appear only at production scale or only on long-running workload that a canary cluster does not exercise.

The Meta Presto example

sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale describes Meta's use of a Shadow Presto cluster specifically to catch post-compilation regressions on long-running queries:

  • New Presto builds first go to a Canary tier, which catches the majority of correctness / performance issues.
  • For long-running queries "where performance/correctness regressions can only be determined after a lot of work is done", a Shadow Presto cluster runs alongside production.
  • Production queries are mirrored to the Shadow cluster; the Shadow cluster runs the candidate release.
  • Results produced by Shadow are compared to results from production for correctness.
  • Performance counters and resource usage are compared as well.

Only when both Canary and Shadow signals are green does the candidate release graduate to the general fleet.

Trade-offs

  • Cost. Shadow clusters double the compute for the workload they mirror. Teams usually narrow the mirror to a representative sample for long-running queries.
  • Side-effect safety. Shadow must not write to the same externally-visible state as production; SELECT-only query engines like Presto make this easier than, e.g., OLTP databases.
  • Result comparison is hard. Non-determinism (ordering, floating point, time-dependent predicates) forces the validator to compare semantically rather than byte-for-byte.

Seen in

Last updated · 319 distilled / 1,201 read