SYSTEM Cited by 1 source
iSCSI (Internet Small Computer Systems Interface)¶
iSCSI is a network block-device protocol that encapsulates SCSI commands over TCP/IP. It lets a client ("initiator") treat a remote block device as if it were locally attached — reads, writes, discards, flushes all tunnel over TCP. iSCSI is the workhorse protocol of classical SAN fabrics.
Role in Fly.io migrations¶
In the Making
Machines Move post, iSCSI is the network block protocol Fly
uses to expose a draining worker's source
Volume as a remote block device on the
target worker, so dm-clone can fall through
to it for un-hydrated reads. The shape Fly calls "temporary
SANs" (patterns/temporary-san-for-fleet-drain).
Fly tried NBD first and switched:
We started out using
nbd. But we kept getting stucknbdkernel threads when there was any kind of network disruption. We're a global public cloud; network disruption happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn't get jammed up when the network hiccuped, and move on.
The decision is pragmatic, not theoretical: NBD is simpler to implement (you can write a userland server "in an afternoon, on top of a file or a SQLite database or S3") and the Linux kernel has native client support; iSCSI is older, more complicated, but battle-tested under adverse network conditions.
iSCSI traffic rides over Fly's internal systems/fly-wireguard-mesh|6PN WireGuard mesh — the same substrate that carries every other inter-worker RPC.
Seen in¶
- sources/2024-07-30-flyio-making-machines-move — The canonical
wiki source for iSCSI in a cloud-platform internal-SAN role.
iSCSI is how the source Volume on
worker-xx-cdg1-1appears as a block device onworker-xx-cdg1-2sodm-clonecan fall through to it.
Caveats¶
- The post doesn't disclose per-region iSCSI concurrency ceilings or bandwidth-sharing policy when many migrations run simultaneously.
- Mutual-auth / integrity story for iSCSI on the 6PN mesh is not discussed (the WireGuard mesh provides confidentiality + integrity at the packet layer).
- iSCSI has a reputation for complexity in enterprise-SAN deployments; Fly's remark — "relatively complicated" — is consistent with that, but they accept the cost for robustness.
Related¶
- systems/nbd — The protocol Fly tried first.
- systems/dm-clone — The consumer of the remote block device.
- systems/fly-wireguard-mesh — The substrate the iSCSI connection rides over.
- patterns/temporary-san-for-fleet-drain — The fleet-level shape Fly uses iSCSI for.