Skip to content

SYSTEM Cited by 1 source

iSCSI (Internet Small Computer Systems Interface)

iSCSI is a network block-device protocol that encapsulates SCSI commands over TCP/IP. It lets a client ("initiator") treat a remote block device as if it were locally attached — reads, writes, discards, flushes all tunnel over TCP. iSCSI is the workhorse protocol of classical SAN fabrics.

Role in Fly.io migrations

In the Making Machines Move post, iSCSI is the network block protocol Fly uses to expose a draining worker's source Volume as a remote block device on the target worker, so dm-clone can fall through to it for un-hydrated reads. The shape Fly calls "temporary SANs" (patterns/temporary-san-for-fleet-drain).

Fly tried NBD first and switched:

We started out using nbd. But we kept getting stuck nbd kernel threads when there was any kind of network disruption. We're a global public cloud; network disruption happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn't get jammed up when the network hiccuped, and move on.

The decision is pragmatic, not theoretical: NBD is simpler to implement (you can write a userland server "in an afternoon, on top of a file or a SQLite database or S3") and the Linux kernel has native client support; iSCSI is older, more complicated, but battle-tested under adverse network conditions.

iSCSI traffic rides over Fly's internal systems/fly-wireguard-mesh|6PN WireGuard mesh — the same substrate that carries every other inter-worker RPC.

Seen in

  • sources/2024-07-30-flyio-making-machines-move — The canonical wiki source for iSCSI in a cloud-platform internal-SAN role. iSCSI is how the source Volume on worker-xx-cdg1-1 appears as a block device on worker-xx-cdg1-2 so dm-clone can fall through to it.

Caveats

  • The post doesn't disclose per-region iSCSI concurrency ceilings or bandwidth-sharing policy when many migrations run simultaneously.
  • Mutual-auth / integrity story for iSCSI on the 6PN mesh is not discussed (the WireGuard mesh provides confidentiality + integrity at the packet layer).
  • iSCSI has a reputation for complexity in enterprise-SAN deployments; Fly's remark — "relatively complicated" — is consistent with that, but they accept the cost for robustness.
Last updated · 200 distilled / 1,178 read