PATTERN Cited by 1 source
Direct-attached NVMe with replication¶
Pattern¶
Run each database instance on a direct-attached NVMe drive (local, fast, ephemeral) and solve the "instance dies, data dies" durability problem with application- layer replication (primary + N replicas + automated failover + backups) rather than with network-attached block storage.
┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ Primary instance │───►│ Replica 1 │───►│ Replica 2 │
│ ┌────────────────┐ │ │ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ DB engine │ │ │ │ DB engine │ │ │ │ DB engine │ │
│ └───────┬────────┘ │ │ └───────┬────────┘ │ │ └───────┬────────┘ │
│ │ 50μs │ │ │ 50μs │ │ │ 50μs │
│ ┌───────▼────────┐ │ │ ┌───────▼────────┐ │ │ ┌───────▼────────┐ │
│ │ Direct NVMe │ │ │ │ Direct NVMe │ │ │ │ Direct NVMe │ │
│ └────────────────┘ │ │ └────────────────┘ │ │ └────────────────┘ │
└──────────────────────┘ └──────────────────────┘ └──────────────────────┘
▲ ▲
│ Frequent backups │
└───────────────► Object storage ◄──────────────────────┘
When to use it¶
- OLTP workloads where IOPS + latency matter. Database commits, session stores, real-time analytics — any workload where every millisecond shows up in user- visible tail latency.
- Workloads that routinely saturate the cloud IOPS budget. If you're paying a lot for provisioned IOPS and still queueing, direct NVMe removes the cap.
- When you control the DB layer. You need to run your own replication, failover, and backup infrastructure.
When not to use it¶
- Storage layer isn't yours to design. A managed service on EBS / PD / similar isn't negotiable — the pattern happens a layer below you.
- Live-resize is critical. The pattern requires provisioning a new instance + migrating to grow capacity (zero-downtime but not instant). If you need minute-scale volume expansion at 3am, network-attached wins.
- Geographic replication requirements exceed local-DC replicas. Local direct-attached + cross-AZ replica still requires a working AZ failover story.
- Small / bursty workloads. At low QPS, the difference between 50 μs and 250 μs is invisible, and the operational complexity isn't worth it.
What it trades off¶
| Property | Direct NVMe + replication | Network-attached (EBS-class) |
|---|---|---|
| IO latency | ~50 μs | ~250 μs |
| IOPS ceiling | Hardware limit (hundreds of thousands) | Cap (3,000 GP3 default; paid upward) |
| Instance loss recovery | Replica takes over; primary re-provisioned | Volume re-attaches to new instance |
| Live-resize | No (migrate to bigger node) | Yes |
| Durability floor | P^N via replication + backups | Provider internal replication |
| Operational surface | Replication, failover, backups in your stack | Hidden behind volume API |
Implementations¶
PlanetScale Metal¶
"Each Metal cluster comes with a primary and two replicas by default for extremely durable data. […] Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime. […] With a Metal database, there is no artificial cap on IOPS." (Source: sources/2025-03-13-planetscale-io-devices-and-latency)
Canonical wiki instance. Three-node primary+2-replicas shape on direct-attached NVMe with Vitess or Postgres.
Other instances (informational)¶
- CockroachDB on local SSD — similar structure: direct- attached local NVMe + Raft replication. Different consistency model (Raft quorum) but same durability substrate logic.
- MongoDB replica sets on local SSD — same shape at the document-database layer.
- Bare-metal MySQL + streaming replication — the pre-cloud default; replaced in most deployments by EBS- backed RDS / Aurora / Cloud SQL / etc.
Why the pattern didn't dominate the cloud era¶
Early cloud-database services optimised for the average application workload (stateless app server + modest DB), where EBS's latency cost was acceptable and elastic resizing was operationally valuable. As storage-heavy SaaS workloads scaled, the IOPS cap + latency penalty became visible in p99.9 and in monthly bills. The pattern reappears now because:
- Local-NVMe instance families (i3 / i4 / i7 / im4gn) became general-purpose, not just for Lambda-style ephemeral storage.
- Kubernetes / orchestration made primary-failover + data-migration workflows operationally feasible.
- Customers got tired of paying for provisioned IOPS.
Seen in¶
- sources/2025-03-13-planetscale-io-devices-and-latency — canonical article-length treatment: latency hierarchy, durability math, IOPS-cap critique, and the explicit Metal embodiment.
Related¶
- systems/planetscale-metal
- systems/nvme-ssd
- systems/aws-ebs
- concepts/network-attached-storage-latency-penalty
- concepts/iops-throttle-network-storage
- concepts/storage-replication-for-durability
- concepts/storage-latency-hierarchy
- concepts/leader-follower-replication
- patterns/leader-based-partition-replication