PATTERN Cited by 1 source
Shadow-then-reverse-shadow migration¶
Definition¶
The shadow-then-reverse-shadow migration pattern is a three-phase migration shape for two parallel implementations of the same CDC pipeline (legacy + new), where the production- table writer swaps between the two systems mid-migration:
- Shadow phase — new system runs in pre-production, writes to a separate shadow table; old system continues to write the production table.
- Reverse shadow phase — writes swap: new system writes the production table; old system, still running, writes the shadow table.
- Cleanup phase — old system (writing the shadow table) is removed.
"In the first step of the lifecycle we set up shadow jobs in the pre-production environment to be delivered via the new system." … "Once the production job and the shadow job were running reliably in the production environment, we began the reverse shadow phase. In this phase, the shadow job's data was written to the production table, effectively making the shadow job the new production job. Meanwhile, the production job's data was written to the shadow table, so the original production job then acted as the shadow job." … "If no discrepancies were detected, the shadow job, now running on the old system, was removed. The new system then took over and continued delivering data through the production job, marking the completion of the migration." — Source: sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale
Why this shape¶
Two structural properties make this pattern uniquely well-fit for CDC-system migrations:
- Continuous post-rollout signal. After the swap, the old system keeps running on the shadow table — providing a continuous comparison reference against the now-authoritative new system. Any divergence is detected via row-count + checksum comparison without consumer impact.
- Hot rollback substrate. "We could roll back fast if discrepancies were detected, without needing to recreate or reconfigure the old system job." The old job is still alive, already-running, already-configured; rollback = swap the writers back, not rebuild the old job.
Together these address the CDC bad-data propagation hazard: a data-quality bug introduced at rollout shows up in the live comparison and is rollback-able without consumer disruption.
Three machine-checkable promotion criteria¶
Each phase transition is gated by:
- Data-quality match — row count + checksum identical between shadow and production tables.
- No landing-latency regression — new system delivers data on time at minimum, ideally faster.
- No resource-utilisation regression — compute + storage are equal-or-better than the legacy job.
For critical-table migrations, additional service-team-negotiated criteria apply.
Distinguishing from adjacent patterns¶
| Pattern | Production-table writer during migration | Rollback substrate |
|---|---|---|
| Shadow-then-reverse-shadow (this) | Swaps — old then new | Old job still running on shadow table |
| Parallel run (Newman) | Stays old | Both systems running; new system never authoritative |
| Notion double-write | Single writer to two stores, then switchover | Reverse the dual-write direction |
| Dual-write migration | Single writer to two stores | Switch reads back to old store |
The shape's distinguishing characteristic: two separate writers, each writing to one store, with the writer-to-store assignment swapping at phase transitions. Different fault model than dual-write — if a writer crashes in dual-write, both stores miss updates; here, only the assigned store misses updates.
When to use¶
- Two parallel implementations of the same CDC pipeline must coexist temporarily (e.g. migration between two ingestion systems against the same source).
- Source data is the same for both systems — the comparison primitive only works if both writers are reading the same source.
- Bad-data propagation is a defining hazard of the pipeline (any CDC system) — the swap-and-keep-old-running shape is what gives you both ongoing signal and hot rollback.
- Tens of thousands of jobs need to be migrated — combined with patterns/automated-job-lifecycle-promotion, this scales beyond any manually-gated migration shape.
When NOT to use¶
- The two systems read from different sources — the comparison primitive doesn't apply.
- The new system has different semantics (not a behavioural re-implementation but a new pipeline producing different data) — checksum comparison will fail by design.
- Single-job, low-stakes migration — the overhead of running two systems in pre-production then in production is justified only when scale or stakes warrant it.
- Strong consistency requirements between the two writers — this pattern allows transient divergence between shadow and production tables; if external consumers are reading both tables and expect cross-table consistency, that's broken.
Composes with¶
- patterns/automated-job-lifecycle-promotion — the control-loop that drives phase transitions for tens of thousands of jobs.
- patterns/partition-marking-stops-cdc-bleeding — the containment primitive when a partition mismatch is detected during reverse shadow.
- patterns/data-quality-analysis-tool-with-edge-case-logging — the operational debugging substrate for investigating mismatches.
- patterns/snapshot-reuse-from-legacy-during-migration — the optimisation that skips the new system's first full-dump by seeding from the legacy system's snapshot.
Seen in¶
- sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale — Meta's data-ingestion-system migration; canonical wiki instance applied to tens of thousands of CDC ingestion jobs against MySQL.
Related¶
- concepts/migration-job-lifecycle — the wrapping state machine
- concepts/shadow-job-pre-production — phase 1 substrate
- concepts/reverse-shadow-phase — phase 2 substrate
- concepts/data-quality-checksum-comparison — the gating mechanism
- concepts/cdc-bad-data-propagation — the hazard this addresses
- patterns/parallel-run-pattern — adjacent pattern; production-writer stays old
- patterns/notion-double-write-backfill-verify-switchover — adjacent pattern; one writer dual-writes
- systems/meta-data-ingestion-system — canonical wiki instance
- companies/meta — company hub