SYSTEM Cited by 3 sources
NVMe SSD¶
What it is¶
NVMe (Non-Volatile Memory express) is the host-controller interface specification that exposes an SSD to the CPU over PCIe instead of over older storage buses (SATA, SAS) designed for spinning disks. An NVMe SSD is a solid-state drive that speaks NVMe natively.
Why it matters: SATA was designed assuming hundreds of IOPS (HDD scale). NVMe was designed for hundreds-of-thousands of IOPS (SSD scale), with multi-queue parallelism, short command sets, and direct PCIe lane access — no HBA translation.
Dicken's positioning¶
"NVMe SSDs are a type of solid state disk that use the non-volatile memory host controller interface specification for blazing-fast IO speed and great bandwidth."
"A round trip from the CPU to a locally-attached NVMe SSD takes about 50,000 nanoseconds (50 microseconds)."
(Source: sources/2025-03-13-planetscale-io-devices-and-latency)
Performance profile¶
| Metric | Typical NVMe SSD |
|---|---|
| Round-trip latency | ~50 μs |
| Random-read IOPS | 100k–1M+ |
| Sequential throughput | 3–14 GB/s (PCIe 4.0–5.0) |
| Internal organisation | Targets → blocks → pages (concepts/nand-flash-page-block-erasure) |
| Parallelism | Multi-target (multi-lane) within the drive (concepts/ssd-parallelism-via-targets) |
Compared to SATA SSD (~100 μs, 100k IOPS max, 550 MB/s sequential) NVMe is a ~2× latency win and ~10× throughput win from the interface change alone, on top of whatever the underlying NAND generations contribute.
Why NVMe matters to databases¶
- Fewer IOs are on the critical path. NVMe's latency floor is close to the NAND's — there's less protocol overhead to subtract from the budget.
- Multi-queue submission matches multi-threaded database engines. A database with many concurrent transactions can keep the drive's parallel targets busy.
- Durability semantics are identical to SATA SSD — power-loss protection, write-ordering guarantees, TRIM/DISCARD all apply.
Hidden performance issues¶
The same physics issues all SSDs have still apply to NVMe:
- Garbage collection causes tail-latency spikes under sustained writes (concepts/ssd-garbage-collection).
- Write amplification scales with workload layout (concepts/write-amplification).
- Endurance is capped at the NAND cell's P/E budget (concepts/write-endurance-nand).
NVMe in the cloud¶
NVMe SSDs appear in cloud deployments in two shapes:
Direct-attached (instance store)¶
The drive is physically in the hypervisor host. Bandwidth and latency approach bare-metal figures. Caveat: the drive does not survive instance termination. Common on AWS i3 / i4 / i7 / im4gn / metal instance families.
Used by:
- systems/planetscale-metal — primary + 2 replicas on direct-attached NVMe.
Network-attached (EBS-class, backed by NVMe)¶
The customer's logical volume is served by a remote fleet of NVMe-backed storage servers over a datacenter network. Adds the concepts/network-attached-storage-latency-penalty|5× latency hop but survives instance loss. See systems/aws-ebs (and AWS's custom systems/aws-nitro-ssd, which is the NVMe drive they built specifically for the EBS workload).
Seen in¶
- sources/2025-03-13-planetscale-io-devices-and-latency — canonical description of NVMe SSD latency (~50 μs) + PCIe / direct-attached framing + the "fastest modern storage gets" positioning.
- sources/2024-08-22-allthingsdistributed-continuous-reinvention-block-storage-at-aws — how AWS adapts NVMe to the network-attached model via Nitro + SRD + custom Nitro SSDs.
- sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18
— benchmark-measured local-NVMe dominance. Ben Dicken's
Postgres 17 vs Postgres 18 benchmarks on an
i7i.2xlarge(1.8 TB local NVMe, 300,000 IOPS) vs threer7i.2xlargeEBS variants — local NVMe consistently wins every concurrency × range-size combination. Price-performance winner at $551.15/mo (more storage, no IOPS cap). Canonical wiki datum that even Postgres 18's new async-I/O modes (worker,io_uring) don't close the gap to local NVMe on the testedoltp_read_onlyshape.
Related¶
- systems/aws-ebs
- systems/aws-nitro-ssd
- systems/planetscale-metal
- concepts/nand-flash-page-block-erasure
- concepts/ssd-parallelism-via-targets
- concepts/ssd-garbage-collection
- concepts/storage-latency-hierarchy
- concepts/network-attached-storage-latency-penalty
- concepts/iops-throttle-network-storage
- concepts/write-endurance-nand