PATTERN Cited by 1 source
Async block-clone for stateful migration¶
Intent¶
Relocate a stateful workload (VM + attached large volume) from one host to another with near-stateless interruption time, by booting the target instantly against a lazily- hydrated clone that falls through to the source over the network for un-fetched blocks.
When to use¶
- The workload has a locally-attached volume the destination host doesn't already have — no SAN fabric to just remount.
- The volume is large enough that full pre-copy would produce unacceptable interruption time (multi-GB with sub-minute interruption budgets).
- Writes during migration can't be lost (rules out
copy → boot → kill). - The source and destination can talk over a network block protocol — iSCSI, NFS-over-block, NBD, or a proprietary protocol.
Structure¶
Components (canonical Fly.io stack from Making Machines Move):
| Role | Fly.io instance | Generic |
|---|---|---|
| Source block device on drain-target worker | Fly Volume | Locally-attached block device |
| Network block protocol | iSCSI (over [systems/fly-wireguard-mesh | 6PN mesh](<../systems/fly-wireguard-mesh |
| Kernel async-clone target | dm-clone |
Any async-clone block device |
| Metadata bitmap | dm-clone metadata device |
Hydrate-state bitmap |
| Orchestrator | flyd BoltDB FSMs | Platform orchestrator |
| New workload | Fly Machine on target worker | VM / container |
Steps¶
- flyd on source (
worker-xx-cdg1-1) stops the Fly Machine (kills the writer, closing the last-write window). - flyd sends an RPC to flyd on destination (
cdg1-2) with migration metadata — critically, source LUKS2 header size (see patterns/fsm-rpc-for-config-metadata-transfer) and any other fleet-skew parameters the target needs. - Destination exposes the source Volume over the network block protocol (iSCSI target on the source side, initiator on the destination side).
- Destination creates a
dm-cloneinstance with the remote source device + a fresh local clone device + a metadata device. - Destination runs
fstrimon the decrypted view of the source to issueDISCARDs todm-clone, short-circuiting hydration of unused blocks (see concepts/trim-discard-integration). - Destination boots a new Fly Machine attached to the clone device.
- Hydration proceeds in background via
kcopyd; user reads that miss the local clone fall through to the source; writes go only to the clone. - When hydration completes,
dm-cloneis converted to a simple linear device (source is no longer needed) and the source worker can reclaim the original Volume.
Known uses¶
- Fly.io (2024) — Canonical wiki instance. The Making Machines Move post's entire subject. Reaches "pull 'drain this host' out of their toolbelt without much ceremony" by summer 2024.
- AWS live-migration on EC2 — Amazon's live-migration of EC2 instances uses a related approach at the hypervisor tier, though AWS doesn't publish as detailed a walkthrough.
Consequences¶
Upsides:
- Interruption time is decoupled from volume size.
- The destination host doesn't need pre-provisioned full copy.
- Compatible with sparse volumes via TRIM short-circuiting.
- Compatible with network disruption if the network block protocol is robust (iSCSI; not NBD per Fly's experience).
Downsides:
- Read performance on partially-hydrated clone falls through to the network for un-hydrated blocks — p99 read latency is elevated during hydration.
- Source worker stays busy serving reads until hydration completes; you don't get the full benefit of drain immediately.
- Fleet-wide config skew (cryptsetup versions, filesystem options, mount flags) surfaces here and has to be solved via metadata transfer.
- Orchestration-state invariants break — any system that assumed "worker = source of truth for workload location" has to be redesigned (Fly's Corrosion case).
Shape-adjacent patterns¶
- Git-repo tier: patterns/blobless-clone-lazy-hydrate — Cloudflare Artifacts ships clones of large repositories with blobs materialised on read.
- Fleet-drain shape: patterns/temporary-san-for-fleet-drain — the fleet-level framing of what async block-clone enables.
Seen in¶
Related¶
- concepts/block-level-async-clone — The primitive.
- concepts/kill-copy-boot-migration-tradeoff — The problem.
- concepts/fleet-drain-operation — The operational use case.
- concepts/trim-discard-integration — The sparse-volume optimisation.
- patterns/temporary-san-for-fleet-drain — Fleet-level framing.
- patterns/fsm-rpc-for-config-metadata-transfer — How metadata travels with the migration.
- patterns/blobless-clone-lazy-hydrate — Git-tier shape sibling.