PATTERN Cited by 1 source
Dual-system sync during migration¶
Dual-system sync during migration is the intermediate-state posture where a legacy system and a strategic system that replaces it are both live in production, and the team keeps them synchronized while they migrate consumers / data / traffic / customers off the legacy one. Unlike patterns/dual-write-migration — which is the same idea applied to emission of data (both pipelines get writes) — dual-system sync covers the broader case of any two systems that authoritatively describe the same thing (service topologies, routing config, identity state, feature flags, schemas), and where the synchronization is itself a production pathway with its own bugs.
Shape of the state¶
- Both legacy and strategic systems expose operations that mutate the shared domain ("where is this service advertised?", "what permissions does this user have?").
- A sync mechanism (periodic reconcile, event-driven, write- through, bidirectional adapter) keeps them equivalent.
- Consumers may read from either side during the migration window.
- Eventually the legacy system is frozen, then decommissioned, then removed.
Why it's a risk surface¶
Two orthogonal failure classes:
- Sync mechanism bugs. The adapter / reconciler itself has code and state. A bug in it can produce exactly the latent- misconfig shape — two systems that quietly disagree until the next read, edit, or evaluation cycle surfaces the inconsistency. The longer the migration, the longer the window for these bugs to sleep.
- Legacy-surface edits bypassing strategic-surface disciplines. The strategic system often carries new safety mechanisms the legacy one lacks — canary rollout, schema validation, progressive deployment, health-signal gating. A change made through the legacy edit path sidesteps all of those. During the migration window, the legacy path is an active, safety-weaker alternative — not a frozen read-only artefact.
Both failure classes tend to have their worst incident payoff when they co-occur with a third change on the shared surface (an unrelated edit, a feature flip, a deploy) — i.e. they tend to activate via the same trigger-event mechanism that powers concepts/latent-misconfiguration.
Exit strategy¶
The post-mortem discipline that works: shorten the window.
- Accelerate migration off the legacy surface.
- Raise documentation and test coverage on the remaining legacy surface during the transition so sync bugs are caught earlier.
- Do not add new features to the legacy surface — every added feature extends the migration window.
- Harden the sync boundary with referential invariants that reject cross-system inconsistencies at edit time.
The point is that lingering in the dual state is itself the risk, and the fix is not "make the sync bulletproof" (you can't, forever) but "finish the migration".
Canonical wiki instance¶
sources/2025-07-16-cloudflare-1111-incident-on-july-14-2025 — Cloudflare's address / topology management was mid-migration from a legacy system ("hard-coding explicit lists of data center locations and attaching them to particular prefixes") to a strategic one ("describe service topologies without needing to hard-code IP addresses"). Both systems existed and had to be synchronized. The 2025-06-06 bug was introduced on the legacy surface — which, critically, lacks progressive-deployment discipline. The 2025-07-14 blast showed both failure classes at once: the legacy surface accepted a cross-reference the strategic system would have flagged, and a change made through that same legacy surface deployed fleet-wide without staged rollout. The stated remediation: accelerate deprecation of the legacy systems, and migrate every addressing change onto the strategic system.
We are currently in an intermediate state in which current and legacy components need to be updated concurrently, so we will be migrating addressing systems away from risky deployment methodologies like this one. We will accelerate our deprecation of the legacy systems in order to provide higher standards for documentation and test coverage.
Relationship to adjacent patterns¶
- patterns/dual-write-migration — the emission-layer specialisation (both pipelines receive writes); shares the 2× cost and consistency tax.
- patterns/progressive-configuration-rollout — the strategic system's contribution that the legacy system lacks; often the reason for the migration in the first place.
Seen in¶
- sources/2025-07-16-cloudflare-1111-incident-on-july-14-2025 — canonical wiki instance; legacy/strategic topology systems running in parallel with a sync boundary is the structural context in which the 07-14 incident occurred, and shortening-the-window is the structural remediation.