Skip to content

CONCEPT Cited by 1 source

Shadow job (pre-production)

Definition

A shadow job is a job that runs in a pre-production environment, consumes the same source as a production job, and writes its output to a separate shadow table — not the production table. The shadow table is invisible to production consumers; the production job and table remain authoritative.

"In the first step of the lifecycle we set up shadow jobs in the pre-production environment to be delivered via the new system. This is essentially a production-realistic test that each shadow job consumed the same source as the production job but delivered data to a different table called the shadow table. This setup can help reveal issues because it exposes the new system to real production data and behavior, while still providing an isolated place to inspect outcomes and deploy fixes quickly." — Source: sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale

Why it's distinct from generic test environments

Three properties distinguish shadow jobs from generic pre-production test jobs:

  1. Same source as production. Test jobs typically read from synthetic or sampled data; shadow jobs read the actual production source. This is what makes the shadow job a production-realistic test.
  2. Separate output table. The shadow table is structurally isolated from the production table, so any bug in the shadow job is invisible to consumers. "An isolated place to inspect outcomes and deploy fixes quickly."
  3. Continuous comparison against production. Row count + checksum mismatches between shadow and production tables are logged continuously, surfacing issues that synthetic-data testing would miss because they only manifest on real production source patterns.

Operational behaviour during the shadow phase

  • Shadow job and production job run in parallel against the same source.
  • Mismatches are logged to a real-time analytics store (Scuba in Meta's case) — see patterns/data-quality-analysis-tool-with-edge-case-logging.
  • Compute + storage quotas of the shadow job are measured so the production environment has sufficient headroom for the next lifecycle phase (Reverse Shadow).
  • Once the shadow job satisfies the promotion criteria (no data-quality issues; no landing-latency regression; no resource-utilization regression), it advances — first to the production environment, then to Reverse Shadow.
  • vs canary: canary exposes one binary to a fraction of traffic; shadow exposes the new binary to all source data but writes to a separate sink.
  • vs dark launch: dark launch executes new code paths without exposing them to users; shadow launch executes the entire pipeline to a separate destination so the outputs can be compared, not just the code paths exercised.
  • vs mirror traffic (e.g. Envoy request_mirror_policies): mirror traffic copies in-flight requests; shadow jobs copy batched / streaming source data at the data-pipeline grain.

Seen in

Last updated · 542 distilled / 1,571 read