SYSTEM Cited by 1 source
NBD (Network Block Device)¶
NBD is the Linux kernel's native network block-device protocol — a simpler, younger alternative to iSCSI. The Linux kernel includes a built-in NBD client, and writing an NBD server is notoriously easy: you can do it "in an afternoon, on top of a file or a SQLite database or S3, and the Linux kernel could mount it as a drive" (Fly.io, 2024-07-30).
Role in Fly.io migrations (tried, then abandoned)¶
In Making Machines
Move, Fly.io initially adopted NBD as the network block
protocol for serving source Volumes to target workers during a
clone-based
migration. They switched to iSCSI after repeated production
issues:
But we kept getting stuck
nbdkernel threads when there was any kind of network disruption. We're a global public cloud; network disruption happens. Honestly, we could have debugged our way through this. But it was simpler just to spike out an iSCSI implementation, observe that didn't get jammed up when the network hiccuped, and move on.
The explicit frame — "we could have debugged this, but it was simpler just to switch" — is worth noting. The usual upstream-the-fix playbook says fix the dependency and upstream the patch; Fly chose the dual — switch to a more-robust alternative and let someone else fix NBD.
Seen in¶
- sources/2024-07-30-flyio-making-machines-move — Named as Fly's initial choice; replaced by iSCSI after stuck-kernel- thread problems under network disruption.
Caveats¶
- The post doesn't detail how the NBD kernel threads got stuck — whether it was a protocol-state-machine issue, a timeout configuration problem, a specific bug in the kernel client, or something else.
- This isn't a blanket indictment of NBD. Other production users (qemu, distributed filesystems) use NBD successfully. Fly's case is specifically a globally distributed deployment where network hiccups happen constantly at ~planetary scale.
Related¶
- systems/iscsi — What Fly settled on.
- systems/dm-clone — The kernel-side consumer that would have been fed by NBD.