Skip to content

SYSTEM Cited by 5 sources

PlanetScale Metal

Definition

PlanetScale Metal is a PlanetScale product tier (launched March 2025) in which each database instance runs on a direct-attached NVMe drive rather than on network-attached Amazon EBS. A Metal cluster ships with a primary + two replicas by default, with cluster-level application replication supplying durability; the storage fabric itself is not shared across nodes. Storage resizing is handled by spinning up new nodes with larger drives and migrating data "with zero downtime". Metal "has no artificial cap on IOPS."

With Metal, you get a full-fledged database cluster set up (Vitess or Postgres), with each database instance running with a direct-attached NVMe SSD drive. Each Metal cluster comes with a primary and two replicas by default for extremely durable data. We allow you to resize your servers with larger drives with just a few clicks of a button when you run up against storage limits. Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime. Perhaps most importantly, with a Metal database, there is no artificial cap on IOPS.

(Source: sources/2025-03-13-planetscale-io-devices-and-latency.)

Architectural thesis

The industry default for managed OLTP databases — EBS under Amazon RDS / Aurora, Google Cloud SQL, prior PlanetScale — is network-attached block storage. Metal rejects that default on two structural axes:

  1. Latency floor. Direct-attached NVMe delivers ~50 μs round-trips vs ~250 μs for EBS — a 5× gap on every IO. See concepts/network-attached-storage-latency-penalty.
  2. Reliability floor. EBS's gp3 SLO guarantees "at least 90% of provisioned IOPS 99% of the time" — 14 min/day of potential degraded operation. Even on io2, at fleet scale, "you'd still be expected to be in a failure condition roughly one third of the time in any given year on just that one database". See concepts/performance-variance-degradation + concepts/blast-radius-multiplier-at-fleet-scale.

The two historical reasons to pay the network-storage penalty — instance-independent durability (volume survives EC2 termination) and elastic capacity resizing (volume modify- in-place) — are replaced:

  • Durability ← cluster-level replication. Three independent nodes with independent storage collapses single-node failure into a cluster-protocol event. PlanetScale's framing: "with three servers, this goes down to 1% × 1% × 1% = 0.0001% chance (1 in one million)" under 1%/month single-server failure.
  • Elastic capacity ← automated spin-up + zero-downtime migrate. "We allow you to resize your servers with larger drives with just a few clicks of a button when you run up against storage limits. Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime."

Both are wiki-canonical on patterns/direct-attached-nvme-with-replication + patterns/shared-nothing-storage-topology.

Relationship to EBS failure rates

Metal is PlanetScale's structural response to the observed fleet-scale EBS failure rate (sources/2025-03-18-planetscale-the-real-failure-rate-of-ebs). The "real failure rate of EBS" post argues that:

Durability model

PlanetScale's durability argument for Metal, verbatim:

Say in a given month, there is a 1% chance of a server failing. With a single server, this means we have a 1% chance of losing our data each month. Maybe this doesn't sound like a big deal, but over a 10 year period this translates to having a greater than 70% chance of having lost your data!

However, with three servers, this goes down to 1% × 1% × 1% = 0.0001% chance (1 in one million). At PlanetScale the protection is actually far stronger than even this, as we automatically detect and replace failed nodes in your cluster. We take frequent and reliable backups of the data in your database for added protection.

The math is illustrative (independent monthly 1% failure is a placeholder); correlated failure — AZ-wide power event, instance-type retirement, software bug — is not covered by the independent-failure formula. PlanetScale notes they "automatically detect and replace failed nodes" + "take frequent and reliable backups" as the empirical hardening on top of the math.

What Metal is built on

  • Direct-attached NVMe on EC2 storage-optimised instance types (Metal doesn't name the exact types on the disclosed posts; i4, i3en, im4gn are the industry-standard candidates).
  • Vitess for MySQL clusters (the original PlanetScale substrate).
  • Postgres as the alternative engine (also on Metal; see PlanetScale system page).
  • Primary + 2 replicas by default — 3-way cluster replication for durability.
  • Automated node lifecycle — health monitoring, replacement provisioning, reparent on node loss.

Seen in

  • sources/2025-03-13-planetscale-io-devices-and-latency — Metal announcement / latency argument. Establishes the ~50 μs local vs ~250 μs EBS latency datum, the 3,000-IOPS gp3-default cap vs "no artificial cap" on Metal, and the replication-for-durability argument.
  • sources/2025-03-18-planetscale-the-real-failure-rate-of-ebs — complementary reliability argument. "This is why we built PlanetScale Metal. With a shared-nothing architecture that uses local storage instead of network- attached storage like EBS, the rest of the shards and nodes in a database are able to continue to operate without problem."
  • sources/2025-07-01-planetscale-planetscale-for-postgresMetal architecture extended from MySQL to Postgres. PlanetScale's launch announcement for PlanetScale for Postgres brings Metal's direct-attached-NVMe cluster shape to Postgres: "PlanetScale Metal's locally-attached NVMe SSD drives fundamentally change the performance/cost ratio for hosting relational databases in the cloud. We're excited to bring this performance to Postgres." Metal-for-Postgres inherits the same primary + 2 replicas
  • direct-attached-NVMe topology + no IOPS cap as Metal-for-MySQL. First wiki datum that Metal is engine-agnostic at the cluster-shape layer — the local-NVMe + replication architecture is a substrate under both engines, not a MySQL-specific choice.

  • sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18First benchmark-measured Metal-adjacent datum on the wiki. Ben Dicken's Postgres 17 vs 18 benchmarks use an i7i.2xlarge with 1.8 TB local NVMe as the canonical direct-attached-NVMe-on-EC2 instance. The i7i wins every scenario tested — every Postgres config × concurrency × range-size combination on sysbench oltp_read_only. Even Postgres 18's new async-I/O modes (worker, io_uring) fail to close the gap from EBS to local NVMe. Price-performance: the i7i at $551.15/mo beats r7i+io2-16k at $1,513.82/mo and narrowly beats r7i+gp3-10k at $492.32/mo on a per-QPS basis. While PlanetScale doesn't benchmark Metal itself (the post uses bare EC2 instances to avoid the "vendor benchmarks own product" framing), the i7i data is the closest vendor-neutral proxy for what Metal delivers on a primary + 2 replicas topology. Empirical backing for the "we built PlanetScale Metal" structural argument; the local-NVMe instance type that PlanetScale uses under Metal is what wins this benchmark.

  • sources/2026-04-21-planetscale-benchmarking-postgresNames Metal's on-AWS reference shape. PlanetScale's multi-vendor Postgres benchmark methodology post confirms that PlanetScale for Postgres runs on i8g M-320 (4 vCPU, 32 GB RAM, 937 GB NVMe SSD) with a primary + 2 replicas across 3 AZs. This is the Metal-on-Postgres-on-AWS shape, the direct successor to the 2025-10-14 post's i7i.2xlarge proxy datum. All benchmarked competitors (Aurora, AlloyDB, CrunchyData, Supabase, TigerData, Neon) use network-attached storage — Metal is the only local- NVMe point in the comparison. The post's availability-posture-equalisation rule ("each would also need to add replicas") is the accounting corollary of Metal's default 3-AZ topology. The Telescope harness publishes these benchmarks per the new patterns/reproducible-benchmark-publication pattern, and price-performance becomes the fourth benchmark dimension Metal's architecture is engineered to dominate.

Caveats

  • No disclosed Metal production benchmarks. The announcement-window posts don't publish QPS / p99 / IOPS numbers for Metal against EBS-backed Aurora or RDS. The latency figures (~50 μs) are illustrative teaching numbers, not Metal measurements. The failure-rate figures are EBS- side arithmetic under stated assumptions, not Metal-side measurements.
  • Metal's own failure modes not fully described. Local- NVMe drive failure rates, instance termination handling, noisy-neighbour behaviour on shared EC2 hardware, and cross-replica consistency during reparent on Metal are not architecturally detailed in the announcement posts.
  • Shared-nothing envelope still has an AZ boundary. An AZ-wide power event takes out all three nodes if they are co-located in one AZ. Metal's AZ / cross-AZ / cross- region replication topology isn't spelled out.
  • Resize migration is not instant. "with zero downtime" but migration-to-new-instance takes real time; during that window cluster is in a transient-topology state.
  • Direct-attached NVMe implementations on the public cloud are EC2 storage-optimised instance types (such as i3en, i4i, im4gn) — PlanetScale doesn't name them in the disclosed posts.
  • Metal competes with itself. Some workloads that don't stress EBS (cached read-heavy, small-volume OLTP) don't see enough benefit to justify the pricing delta; the post doesn't quantify the price gap.
Last updated · 319 distilled / 1,201 read