Skip to content

PATTERN Cited by 1 source

Async block-clone for stateful migration

Intent

Relocate a stateful workload (VM + attached large volume) from one host to another with near-stateless interruption time, by booting the target instantly against a lazily- hydrated clone that falls through to the source over the network for un-fetched blocks.

When to use

  • The workload has a locally-attached volume the destination host doesn't already have — no SAN fabric to just remount.
  • The volume is large enough that full pre-copy would produce unacceptable interruption time (multi-GB with sub-minute interruption budgets).
  • Writes during migration can't be lost (rules out copy → boot → kill).
  • The source and destination can talk over a network block protocol — iSCSI, NFS-over-block, NBD, or a proprietary protocol.

Structure

Components (canonical Fly.io stack from Making Machines Move):

Role Fly.io instance Generic
Source block device on drain-target worker Fly Volume Locally-attached block device
Network block protocol iSCSI (over [systems/fly-wireguard-mesh 6PN mesh](<../systems/fly-wireguard-mesh
Kernel async-clone target dm-clone Any async-clone block device
Metadata bitmap dm-clone metadata device Hydrate-state bitmap
Orchestrator flyd BoltDB FSMs Platform orchestrator
New workload Fly Machine on target worker VM / container

Steps

  1. flyd on source (worker-xx-cdg1-1) stops the Fly Machine (kills the writer, closing the last-write window).
  2. flyd sends an RPC to flyd on destination (cdg1-2) with migration metadata — critically, source LUKS2 header size (see patterns/fsm-rpc-for-config-metadata-transfer) and any other fleet-skew parameters the target needs.
  3. Destination exposes the source Volume over the network block protocol (iSCSI target on the source side, initiator on the destination side).
  4. Destination creates a dm-clone instance with the remote source device + a fresh local clone device + a metadata device.
  5. Destination runs fstrim on the decrypted view of the source to issue DISCARDs to dm-clone, short-circuiting hydration of unused blocks (see concepts/trim-discard-integration).
  6. Destination boots a new Fly Machine attached to the clone device.
  7. Hydration proceeds in background via kcopyd; user reads that miss the local clone fall through to the source; writes go only to the clone.
  8. When hydration completes, dm-clone is converted to a simple linear device (source is no longer needed) and the source worker can reclaim the original Volume.

Known uses

  • Fly.io (2024) — Canonical wiki instance. The Making Machines Move post's entire subject. Reaches "pull 'drain this host' out of their toolbelt without much ceremony" by summer 2024.
  • AWS live-migration on EC2 — Amazon's live-migration of EC2 instances uses a related approach at the hypervisor tier, though AWS doesn't publish as detailed a walkthrough.

Consequences

Upsides:

  • Interruption time is decoupled from volume size.
  • The destination host doesn't need pre-provisioned full copy.
  • Compatible with sparse volumes via TRIM short-circuiting.
  • Compatible with network disruption if the network block protocol is robust (iSCSI; not NBD per Fly's experience).

Downsides:

  • Read performance on partially-hydrated clone falls through to the network for un-hydrated blocks — p99 read latency is elevated during hydration.
  • Source worker stays busy serving reads until hydration completes; you don't get the full benefit of drain immediately.
  • Fleet-wide config skew (cryptsetup versions, filesystem options, mount flags) surfaces here and has to be solved via metadata transfer.
  • Orchestration-state invariants break — any system that assumed "worker = source of truth for workload location" has to be redesigned (Fly's Corrosion case).

Shape-adjacent patterns

Seen in

Last updated · 200 distilled / 1,178 read