Skip to content

PATTERN Cited by 1 source

Direct-attached NVMe with replication

Pattern

Run each database instance on a direct-attached NVMe drive (local, fast, ephemeral) and solve the "instance dies, data dies" durability problem with application- layer replication (primary + N replicas + automated failover + backups) rather than with network-attached block storage.

┌──────────────────────┐    ┌──────────────────────┐    ┌──────────────────────┐
│  Primary instance    │───►│  Replica 1           │───►│  Replica 2           │
│  ┌────────────────┐  │    │  ┌────────────────┐  │    │  ┌────────────────┐  │
│  │ DB engine      │  │    │  │ DB engine      │  │    │  │ DB engine      │  │
│  └───────┬────────┘  │    │  └───────┬────────┘  │    │  └───────┬────────┘  │
│          │ 50μs      │    │          │ 50μs      │    │          │ 50μs      │
│  ┌───────▼────────┐  │    │  ┌───────▼────────┐  │    │  ┌───────▼────────┐  │
│  │ Direct NVMe    │  │    │  │ Direct NVMe    │  │    │  │ Direct NVMe    │  │
│  └────────────────┘  │    │  └────────────────┘  │    │  └────────────────┘  │
└──────────────────────┘    └──────────────────────┘    └──────────────────────┘
         ▲                                                       ▲
         │           Frequent backups                            │
         └───────────────► Object storage ◄──────────────────────┘

When to use it

  • OLTP workloads where IOPS + latency matter. Database commits, session stores, real-time analytics — any workload where every millisecond shows up in user- visible tail latency.
  • Workloads that routinely saturate the cloud IOPS budget. If you're paying a lot for provisioned IOPS and still queueing, direct NVMe removes the cap.
  • When you control the DB layer. You need to run your own replication, failover, and backup infrastructure.

When not to use it

  • Storage layer isn't yours to design. A managed service on EBS / PD / similar isn't negotiable — the pattern happens a layer below you.
  • Live-resize is critical. The pattern requires provisioning a new instance + migrating to grow capacity (zero-downtime but not instant). If you need minute-scale volume expansion at 3am, network-attached wins.
  • Geographic replication requirements exceed local-DC replicas. Local direct-attached + cross-AZ replica still requires a working AZ failover story.
  • Small / bursty workloads. At low QPS, the difference between 50 μs and 250 μs is invisible, and the operational complexity isn't worth it.

What it trades off

Property Direct NVMe + replication Network-attached (EBS-class)
IO latency ~50 μs ~250 μs
IOPS ceiling Hardware limit (hundreds of thousands) Cap (3,000 GP3 default; paid upward)
Instance loss recovery Replica takes over; primary re-provisioned Volume re-attaches to new instance
Live-resize No (migrate to bigger node) Yes
Durability floor P^N via replication + backups Provider internal replication
Operational surface Replication, failover, backups in your stack Hidden behind volume API

Implementations

PlanetScale Metal

"Each Metal cluster comes with a primary and two replicas by default for extremely durable data. […] Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime. […] With a Metal database, there is no artificial cap on IOPS." (Source: sources/2025-03-13-planetscale-io-devices-and-latency)

Canonical wiki instance. Three-node primary+2-replicas shape on direct-attached NVMe with Vitess or Postgres.

Other instances (informational)

  • CockroachDB on local SSD — similar structure: direct- attached local NVMe + Raft replication. Different consistency model (Raft quorum) but same durability substrate logic.
  • MongoDB replica sets on local SSD — same shape at the document-database layer.
  • Bare-metal MySQL + streaming replication — the pre-cloud default; replaced in most deployments by EBS- backed RDS / Aurora / Cloud SQL / etc.

Why the pattern didn't dominate the cloud era

Early cloud-database services optimised for the average application workload (stateless app server + modest DB), where EBS's latency cost was acceptable and elastic resizing was operationally valuable. As storage-heavy SaaS workloads scaled, the IOPS cap + latency penalty became visible in p99.9 and in monthly bills. The pattern reappears now because:

  • Local-NVMe instance families (i3 / i4 / i7 / im4gn) became general-purpose, not just for Lambda-style ephemeral storage.
  • Kubernetes / orchestration made primary-failover + data-migration workflows operationally feasible.
  • Customers got tired of paying for provisioned IOPS.

Seen in

Last updated · 319 distilled / 1,201 read