PATTERN Cited by 1 source
Temporary SAN for fleet drain¶
Intent¶
Turn a draining worker's locally-attached storage into a network-accessible block device for the duration of the drain, so that target workers elsewhere in the fleet can pull blocks from it on demand to satisfy reads on freshly-booted replica workloads. In short: spin up a SAN you didn't have the rest of the year, just while you're draining.
When to use¶
- Your normal storage tier is local-attached (not a SAN).
- You need to drain workers occasionally but not continuously.
- Full permanent SAN infrastructure in every region is not affordable or not yet justified.
- You have a mesh network between workers that can carry block-device traffic (WireGuard, VPC peering, cross-AZ bandwidth).
Structure¶
The post's canonical phrasing:
To drain a worker with minimal downtime and no lost data, we turn workers into a temporary SANs, serving the volumes we need to drain to fresh-booted replica Fly Machines on a bunch of "target" physicals. Those SANs — combinations of
dm-clone, iSCSI, and ourflydorchestrator — track the blocks copied from the origin, copying each one exactly once and cleaning up when the original volume has been fully copied.
Components:
- Source workers become iSCSI targets for the Volumes being drained.
- Destination workers become iSCSI initiators and stack
dm-cloneon top of the remote device. - Orchestrator (flyd) tracks which Volumes are being drained from where to where.
- The SAN exists only during the drain; when hydration completes the target no longer needs the source, and the temporary SAN shape evaporates.
Consequences¶
Upsides:
- You get drain without running a SAN full-time. Hardware cost stays at local-NVMe levels; SAN-like capability appears on demand.
- The fabric is regional-or-less — iSCSI traffic rides the same mesh (Fly's 6PN) that already connects workers.
- Scales naturally with fleet size — any worker can be an iSCSI source or target when needed.
Downsides:
- During drain, the source worker is busier than it was before (serving both its remaining workloads and an iSCSI target stream).
- Network disruption during drain is more impactful — reads from partially-hydrated clones depend on the network block protocol staying up. This is where Fly's NBD-to-iSCSI switch mattered.
- Orchestrator state-tracking gets more complex — flyd's FSMs have to cope with partial-hydration / failed-migration / cleanup scenarios.
- You still need a long-term plan. The post gestures at LSVD as Fly's medium-term evolution away from local-NVMe-as-durable.
Relation to classical SAN¶
A classical SAN (EBS, FlashArray, Ceph) is:
- Always on. Every compute host sees the SAN all the time.
- Durable. The SAN is the authoritative copy of the data.
- Expensive. Fabric + controllers + replication across AZs.
Temporary-SAN-for-drain is:
- On demand. Only exists when a drain is happening.
- Not durable in the SAN layer. Durability is still the local NVMe's responsibility; the SAN is a transport for migration, not a store.
- Cheap. Reuses existing mesh + kernel DM + userspace iSCSI.
Known uses¶
- Fly.io (2024) — Canonical wiki instance. The entire shape the 2024-07-30 post describes. "Worker physicals become temporary SANs serving volumes to fresh-booted replica Fly Machines."
Seen in¶
Related¶
- patterns/async-block-clone-for-stateful-migration — The per-workload migration recipe the temporary SAN enables.
- concepts/fleet-drain-operation — The use case.
- systems/iscsi — The chosen substrate.
- systems/dm-clone — The consumer on the target side.