SYSTEM Cited by 5 sources
PlanetScale Metal¶
Definition¶
PlanetScale Metal is a PlanetScale product tier (launched March 2025) in which each database instance runs on a direct-attached NVMe drive rather than on network-attached Amazon EBS. A Metal cluster ships with a primary + two replicas by default, with cluster-level application replication supplying durability; the storage fabric itself is not shared across nodes. Storage resizing is handled by spinning up new nodes with larger drives and migrating data "with zero downtime". Metal "has no artificial cap on IOPS."
With Metal, you get a full-fledged database cluster set up (Vitess or Postgres), with each database instance running with a direct-attached NVMe SSD drive. Each Metal cluster comes with a primary and two replicas by default for extremely durable data. We allow you to resize your servers with larger drives with just a few clicks of a button when you run up against storage limits. Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime. Perhaps most importantly, with a Metal database, there is no artificial cap on IOPS.
(Source: sources/2025-03-13-planetscale-io-devices-and-latency.)
Architectural thesis¶
The industry default for managed OLTP databases — EBS under Amazon RDS / Aurora, Google Cloud SQL, prior PlanetScale — is network-attached block storage. Metal rejects that default on two structural axes:
- Latency floor. Direct-attached NVMe delivers ~50 μs round-trips vs ~250 μs for EBS — a 5× gap on every IO. See concepts/network-attached-storage-latency-penalty.
- Reliability floor. EBS's gp3 SLO guarantees "at least 90% of provisioned IOPS 99% of the time" — 14 min/day of potential degraded operation. Even on io2, at fleet scale, "you'd still be expected to be in a failure condition roughly one third of the time in any given year on just that one database". See concepts/performance-variance-degradation + concepts/blast-radius-multiplier-at-fleet-scale.
The two historical reasons to pay the network-storage penalty — instance-independent durability (volume survives EC2 termination) and elastic capacity resizing (volume modify- in-place) — are replaced:
- Durability ← cluster-level replication. Three independent nodes with independent storage collapses single-node failure into a cluster-protocol event. PlanetScale's framing: "with three servers, this goes down to 1% × 1% × 1% = 0.0001% chance (1 in one million)" under 1%/month single-server failure.
- Elastic capacity ← automated spin-up + zero-downtime migrate. "We allow you to resize your servers with larger drives with just a few clicks of a button when you run up against storage limits. Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime."
Both are wiki-canonical on patterns/direct-attached-nvme-with-replication + patterns/shared-nothing-storage-topology.
Relationship to EBS failure rates¶
Metal is PlanetScale's structural response to the observed fleet-scale EBS failure rate (sources/2025-03-18-planetscale-the-real-failure-rate-of-ebs). The "real failure rate of EBS" post argues that:
- At fleet scale, EBS event rate is effectively 100% ("multiple events on a daily basis" across PlanetScale's fleet).
- io2 does not fix it ("correlated failure inside of a single zone, even using io2 volumes").
- PlanetScale's patterns/automated-volume-health-monitoring
- patterns/zero-downtime-reparent-on-degradation clamps the impact window on EBS but cannot eliminate events.
- The structural fix is shared-nothing on local NVMe — i.e. Metal.
Durability model¶
PlanetScale's durability argument for Metal, verbatim:
Say in a given month, there is a 1% chance of a server failing. With a single server, this means we have a 1% chance of losing our data each month. Maybe this doesn't sound like a big deal, but over a 10 year period this translates to having a greater than 70% chance of having lost your data!
However, with three servers, this goes down to 1% × 1% × 1% = 0.0001% chance (1 in one million). At PlanetScale the protection is actually far stronger than even this, as we automatically detect and replace failed nodes in your cluster. We take frequent and reliable backups of the data in your database for added protection.
The math is illustrative (independent monthly 1% failure is a placeholder); correlated failure — AZ-wide power event, instance-type retirement, software bug — is not covered by the independent-failure formula. PlanetScale notes they "automatically detect and replace failed nodes" + "take frequent and reliable backups" as the empirical hardening on top of the math.
What Metal is built on¶
- Direct-attached NVMe on EC2 storage-optimised instance
types (Metal doesn't name the exact types on the disclosed
posts;
i4,i3en,im4gnare the industry-standard candidates). - Vitess for MySQL clusters (the original PlanetScale substrate).
- Postgres as the alternative engine (also on Metal; see PlanetScale system page).
- Primary + 2 replicas by default — 3-way cluster replication for durability.
- Automated node lifecycle — health monitoring, replacement provisioning, reparent on node loss.
Seen in¶
- sources/2025-03-13-planetscale-io-devices-and-latency — Metal announcement / latency argument. Establishes the ~50 μs local vs ~250 μs EBS latency datum, the 3,000-IOPS gp3-default cap vs "no artificial cap" on Metal, and the replication-for-durability argument.
- sources/2025-03-18-planetscale-the-real-failure-rate-of-ebs — complementary reliability argument. "This is why we built PlanetScale Metal. With a shared-nothing architecture that uses local storage instead of network- attached storage like EBS, the rest of the shards and nodes in a database are able to continue to operate without problem."
- sources/2025-07-01-planetscale-planetscale-for-postgres — Metal architecture extended from MySQL to Postgres. PlanetScale's launch announcement for PlanetScale for Postgres brings Metal's direct-attached-NVMe cluster shape to Postgres: "PlanetScale Metal's locally-attached NVMe SSD drives fundamentally change the performance/cost ratio for hosting relational databases in the cloud. We're excited to bring this performance to Postgres." Metal-for-Postgres inherits the same primary + 2 replicas
-
direct-attached-NVMe topology + no IOPS cap as Metal-for-MySQL. First wiki datum that Metal is engine-agnostic at the cluster-shape layer — the local-NVMe + replication architecture is a substrate under both engines, not a MySQL-specific choice.
-
sources/2025-10-14-planetscale-benchmarking-postgres-17-vs-18 — First benchmark-measured Metal-adjacent datum on the wiki. Ben Dicken's Postgres 17 vs 18 benchmarks use an
i7i.2xlargewith 1.8 TB local NVMe as the canonical direct-attached-NVMe-on-EC2 instance. The i7i wins every scenario tested — every Postgres config × concurrency × range-size combination onsysbench oltp_read_only. Even Postgres 18's new async-I/O modes (worker,io_uring) fail to close the gap from EBS to local NVMe. Price-performance: the i7i at $551.15/mo beatsr7i+io2-16k at $1,513.82/mo and narrowly beatsr7i+gp3-10k at $492.32/mo on a per-QPS basis. While PlanetScale doesn't benchmark Metal itself (the post uses bare EC2 instances to avoid the "vendor benchmarks own product" framing), the i7i data is the closest vendor-neutral proxy for what Metal delivers on aprimary + 2 replicastopology. Empirical backing for the "we built PlanetScale Metal" structural argument; the local-NVMe instance type that PlanetScale uses under Metal is what wins this benchmark. - sources/2026-04-21-planetscale-benchmarking-postgres —
Names Metal's on-AWS reference shape. PlanetScale's
multi-vendor Postgres benchmark methodology post confirms
that PlanetScale for
Postgres runs on
i8g M-320(4 vCPU, 32 GB RAM, 937 GB NVMe SSD) with a primary + 2 replicas across 3 AZs. This is the Metal-on-Postgres-on-AWS shape, the direct successor to the 2025-10-14 post'si7i.2xlargeproxy datum. All benchmarked competitors (Aurora, AlloyDB, CrunchyData, Supabase, TigerData, Neon) use network-attached storage — Metal is the only local- NVMe point in the comparison. The post's availability-posture-equalisation rule ("each would also need to add replicas") is the accounting corollary of Metal's default 3-AZ topology. The Telescope harness publishes these benchmarks per the new patterns/reproducible-benchmark-publication pattern, and price-performance becomes the fourth benchmark dimension Metal's architecture is engineered to dominate.
Caveats¶
- No disclosed Metal production benchmarks. The announcement-window posts don't publish QPS / p99 / IOPS numbers for Metal against EBS-backed Aurora or RDS. The latency figures (~50 μs) are illustrative teaching numbers, not Metal measurements. The failure-rate figures are EBS- side arithmetic under stated assumptions, not Metal-side measurements.
- Metal's own failure modes not fully described. Local- NVMe drive failure rates, instance termination handling, noisy-neighbour behaviour on shared EC2 hardware, and cross-replica consistency during reparent on Metal are not architecturally detailed in the announcement posts.
- Shared-nothing envelope still has an AZ boundary. An AZ-wide power event takes out all three nodes if they are co-located in one AZ. Metal's AZ / cross-AZ / cross- region replication topology isn't spelled out.
- Resize migration is not instant. "with zero downtime" but migration-to-new-instance takes real time; during that window cluster is in a transient-topology state.
- Direct-attached NVMe implementations on the public cloud
are EC2 storage-optimised instance types (such as
i3en,i4i,im4gn) — PlanetScale doesn't name them in the disclosed posts. - Metal competes with itself. Some workloads that don't stress EBS (cached read-heavy, small-volume OLTP) don't see enough benefit to justify the pricing delta; the post doesn't quantify the price gap.
Related¶
- systems/planetscale
- systems/planetscale-for-postgres
- systems/telescope-planetscale
- systems/vitess
- systems/mysql
- systems/innodb
- systems/postgresql
- systems/aws-ebs
- concepts/network-attached-storage-latency-penalty
- concepts/performance-variance-degradation
- concepts/correlated-ebs-failure
- concepts/blast-radius-multiplier-at-fleet-scale
- concepts/slow-is-failure
- concepts/compute-storage-separation
- concepts/price-performance-ratio
- patterns/direct-attached-nvme-with-replication
- patterns/shared-nothing-storage-topology
- patterns/automated-volume-health-monitoring
- patterns/zero-downtime-reparent-on-degradation
- patterns/reproducible-benchmark-publication
- companies/planetscale