PATTERN Cited by 1 source
Parallel staging pipeline for production verification¶
Summary¶
Run a second copy of a data pipeline — the staging pipeline — alongside the production pipeline. The staging pipeline consumes the same production data the production pipeline does, but executes code under test and writes to a separate, low-latency verification substrate. Verify correctness by comparing staging output against production output (or an independent source of truth) via integrity checks. Only when verification passes do the code changes ship to the production pipeline.
Structure¶
Production data (shared)
│
┌────────────┴────────────┐
▼ ▼
Production pipeline Staging pipeline
(stable code) (code under test)
│ │
▼ ▼
Production output Verification output
(authoritative) (throwaway scratch)
│ │
│ ▼
│ Integrity checkers
│ (cross-reference
│ production /
│ independent truth)
│ │
└──── Promote code ◄──────┘
after verification
When to use¶
Apply this pattern when all of these conditions hold:
- You have a data pipeline (batch or near-batch) rather than a request-handling service. (Shadow traffic is the request-side equivalent.)
- Dev-environment fixtures don't cover production diversity. You've been burned by edge cases first appearing in prod.
- Running the pipeline twice is affordable. Compute cost of the parallel run is acceptable.
- Rollback of bad code is expensive if you test directly in prod — e.g. you'd have to repair corrupted prod data.
Conversely, avoid if the pipeline is cheap to rerun from scratch (then feature-branch experiments suffice) or if the production substrate is already fast enough that you don't need a parallel path.
Mechanism (Yelp's canonical implementation)¶
Yelp's 2025-05-27 implementation for the Revenue Data Pipeline:
- Config-driven duplication — a separate data-pipeline configuration instance; not a whole new codebase.
- Shared input — both pipelines read the same production MySQL snapshot from S3.
- Separate output substrate — production writes via the Redshift Connector to Redshift; staging writes to AWS Glue catalog tables on S3.
- Low-latency query path for staging — verification via Redshift Spectrum direct-queries the Glue tables; no connector wait.
- Bilateral discipline — "the production pipeline and its data were left untouched until the new changes were verified."
- Integrity-checker coupling — daily SQL checks run against staging output; monthly SQL checks run against production output against billing-system truth.
Trade-offs¶
- Compute cost ~2×. Both pipelines run in full.
- Operational complexity. Two pipelines to keep in sync at the config level; config drift between staging and prod can mask real bugs.
- Access control. Staging consumes production data — the staging pipeline has prod read permissions it may not strictly need.
- Integrity-check maturity required. The staging pipeline is worth nothing without an integrity check that actually compares its output to something. Without comparison, you just have a second authoritative-looking dataset.
Variations¶
- Full parallel (Yelp's shape): both pipelines run end-to- end; verification is an out-of-band integrity check.
- Shadow-write (request-response analogue): production handles the real request; shadow path mirrors and verifies. Pattern name: patterns/shadow-migration.
- Canary pipeline: fraction of production inputs routed to the new pipeline; rest to old. Different risk profile — can produce partial bad output on the canary slice.
Cross-source links¶
- Datadog's patterns/schema-validation-before-deploy inverts the time axis: validate at deploy time, not runtime. Useful when the downstream contract is yours; Yelp's staging-pipeline approach is needed when you also need to validate runtime behaviour against production-only inputs.
- Shadow-pipeline testing at systems/slack-deploy-safety-program is the request-handling side analogue.
Seen in¶
- sources/2025-05-27-yelp-revenue-automation-series-testing-an-integration-with-third-party-system — canonical instance. Yelp's Revenue Data Pipeline runs this exact pattern. Pattern name canonicalised on this ingest.