CONCEPT Cited by 1 source

Migration job lifecycle¶

Definition¶

A migration job lifecycle is the per-job state machine governing where a job is in a multi-phase migration between two parallel implementations of the same logical pipeline. Each phase is gated by explicit promotion criteria; each phase transition is reversible via demotion so that transient failures don't trap a job in a broken state.

"Our first step was to establish a clear migration job lifecycle to ensure data integrity and operational reliability throughout the process. Each job needed to be verified for correctness and had to meet defined success criteria before moving to the next step of the migration lifecycle." — Source: sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale

Canonical phases¶

Meta's data-ingestion-system migration uses a three-phase lifecycle (Source: sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale):

Phase	What happens	Production-table writer
Shadow Phase	New-system job runs in pre-production environment, consumes same source as production, writes to separate shadow table	Old system
Reverse Shadow Phase	Writes swap — new system writes production table; old system writes shadow table	New system
Migration Cleanup	Old-system job (now writing shadow table) is removed entirely	New system (sole writer)

Promotion criteria¶

Each phase transition is gated by machine-checkable criteria. The Meta lifecycle uses three (Source):

No data quality issues — row count + checksum match between old and new system's outputs (see concepts/data-quality-checksum-comparison).
No landing latency regression — new system delivers data on time at minimum, ideally faster.
No resource utilization regression — compute + storage are equal or better than the legacy job.

For "the critical table migration" additional team-specific criteria are negotiated.

Demotion is structural, not exceptional¶

The lifecycle is bidirectional: a job that was promoted to a later phase can be automatically demoted if its criteria stop being met (e.g. a transient regression appears after promotion). This is what allows the lifecycle to be a continuous-control-loop primitive (patterns/automated-job-lifecycle-promotion) rather than a one-way state machine — without demotion, transient failures would require manual intervention to unstick.

Why it scales to tens of thousands of jobs¶

The lifecycle's value is abstracting per-job migration risk into a universal state machine — once the criteria are stable, the gating logic is the same for every job, and the only per-job work is fixing whatever is keeping a job out of its next phase. The automated promotion loop then drives every job through the lifecycle in parallel, with the operator surface (system-level + job-level dashboards) showing only the jobs that are stuck or regressing.

vs parallel run: parallel run keeps the old system as authoritative throughout; this lifecycle has phases where the new system is authoritative (Reverse Shadow Phase) before the cleanup.
vs patterns/notion-double-write-backfill-verify-switchover: Notion's pattern uses one writer dual-writing to two stores; this lifecycle has two separate writers, each writing to one store, with the writer-to-table assignment swapping at phase transitions.
vs canary deployment: canary controls exposure of one binary to a fraction of traffic; migration job lifecycle controls which-system writes which-table across the entire job population.

Seen in¶

sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale — Meta's three-phase Shadow → Reverse Shadow → Cleanup lifecycle for tens of thousands of CDC ingestion jobs. Canonical wiki instance.

concepts/shadow-job-pre-production — phase 1 substrate
concepts/reverse-shadow-phase — phase 2 substrate
concepts/data-quality-checksum-comparison — promotion-criterion #1 mechanism
concepts/landing-latency — promotion-criterion #2 SLI
patterns/shadow-then-reverse-shadow-migration — the canonical pattern wrapping this lifecycle
patterns/automated-job-lifecycle-promotion — the control-loop pattern that drives the lifecycle at scale
systems/meta-data-ingestion-system — canonical wiki instance
companies/meta — company hub