PATTERN Cited by 1 source
Userspace FTL via io_uring + ublk¶
Context¶
The Flash Translation Layer (FTL) in an SSD translates host LBAs to physical NAND addresses, performs wear-leveling, garbage collects, and manages bad blocks. Traditionally the FTL runs inside the drive's firmware — an embedded processor + DRAM on the SSD handles the entire policy. This is simple for the host but opaque: the host has no visibility into GC cycles, wear distribution, or write-coalescing decisions, and cannot influence them.
At the density + asymmetry levels of modern QLC flash (systems/qlc-flash), opacity at the host becomes load-bearing cost:
- GC running during a latency-sensitive read window stalls the read.
- Write-coalescing policy affects R/W arbitration — only the host knows whether a pending write is bulk or latency-dependent.
- Firmware iteration cycle is months; software iteration is days.
The pattern¶
Move the FTL to userspace on the host. Expose the storage to applications as a regular block device via Linux's ublk (userspace block device driver) framework, which forwards block I/O from the kernel to a userspace daemon. Use io_uring as the zero-copy, high-throughput ring-buffer path between the kernel and the userspace daemon.
┌──────────────────────────────────────────┐
│ Application │
├──────────────────────────────────────────┤
│ Kernel block device (regular) │
├──────────────────────────────────────────┤
│ ublk │ ← syscall-free path
├──────────────────────────────────────────┤
│ io_uring (shared ring buffer) │
├──────────────────────────────────────────┤
│ Userspace FTL daemon │ ← wear leveling, GC, mapping
├──────────────────────────────────────────┤
│ Raw flash device | NVMe block device │
└──────────────────────────────────────────┘
- ublk gives apps a regular block-device interface. No vendor library needed at the app layer.
- io_uring is the submit/complete ring-buffer primitive (Linux 5.1+). Sharing pages between kernel and userspace enables zero-copy for DMA-able buffers.
- The userspace FTL daemon owns wear-leveling, GC scheduling, mapping-table management — policy in userspace, data-path via io_uring.
Canonical instance¶
Meta's 2025-03-04 QLC post discloses Pure Storage's DirectFlash Module (DFM) + DirectFlash software using exactly this stack:
"The software stack in Pure Storage's solutions uses Linux userspace block device driver (ublk) devices over io_uring to both expose the storage as a regular block device and enable zero copy for data copy elimination — as well as talk to their userspace FTL (DirectFlash software) in the background. For other vendors, the stack uses io_uring to directly interact with the NVMe block device."
Two deployment shapes in the same Meta server:
- DFM path: ublk → io_uring → userspace DirectFlash FTL → DFM (raw flash).
- Standard NVMe QLC path: io_uring → NVMe block device (firmware FTL).
Both coexist in Meta's rack design because both fit the U.2-15mm slot.
Why this pattern works for asymmetric media¶
The R/W-asymmetry problem (concepts/qlc-read-write-asymmetry) is only solvable if the scheduler has full visibility into pending writes. On a firmware-FTL drive, writes may be internally queued and the kernel has no view into that state. The rate controller pattern is effectively blocked.
With host-side FTL, every write is visible; the userspace daemon can throttle, pace, coalesce, or prioritise before dispatching to the media. This is the composition: userspace FTL + rate controller is the software-side answer to QLC's media-level asymmetry.
Trade-offs¶
- Complexity on the host. The stack now depends on vendor daemon + ublk + io_uring; fewer moving parts lived in the drive but the host has more.
- Vendor runtime on the host. DirectFlash software ships with Pure Storage; cross-vendor flash swaps are harder.
- CPU cost. Host cycles go to FTL work that a drive-embedded processor used to do.
- Harder debugging. Userspace daemons can crash / hang / leak; firmware FTL failures were rare but recoverable by drive reset.
The trade is accepted when the visibility + policy control wins exceed the host-complexity costs, which tends to be true at hyperscale when media asymmetries or QoS requirements are tight.
Adjacent patterns¶
- patterns/rate-controller-for-asymmetric-media — composed with this pattern to address R/W asymmetry.
- patterns/middle-tier-storage-media — the broader media-deployment pattern this software stack serves.
Seen in¶
- sources/2025-03-04-meta-a-case-for-qlc-ssds-in-the-data-center — canonical Meta + Pure Storage instance.
Related¶
- systems/qlc-flash — the media whose properties justify host-side FTL.
- systems/pure-storage-directflash-module — the primary DFM + DirectFlash software instance.
- concepts/qlc-read-write-asymmetry — the problem userspace FTL helps solve.
- patterns/rate-controller-for-asymmetric-media — the schedule-layer composition partner.
- patterns/middle-tier-storage-media.
- companies/meta.