SYSTEM Cited by 5 sources

PlanetScale Metal¶

Definition¶

PlanetScale Metal is a PlanetScale product tier (launched March 2025) in which each database instance runs on a direct-attached NVMe drive rather than on network-attached Amazon EBS. A Metal cluster ships with a primary + two replicas by default, with cluster-level application replication supplying durability; the storage fabric itself is not shared across nodes. Storage resizing is handled by spinning up new nodes with larger drives and migrating data "with zero downtime". Metal "has no artificial cap on IOPS."

With Metal, you get a full-fledged database cluster set up (Vitess or Postgres), with each database instance running with a direct-attached NVMe SSD drive. Each Metal cluster comes with a primary and two replicas by default for extremely durable data. We allow you to resize your servers with larger drives with just a few clicks of a button when you run up against storage limits. Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime. Perhaps most importantly, with a Metal database, there is no artificial cap on IOPS.

(Source: sources/2025-03-13-planetscale-io-devices-and-latency.)

Architectural thesis¶

The industry default for managed OLTP databases — EBS under Amazon RDS / Aurora, Google Cloud SQL, prior PlanetScale — is network-attached block storage. Metal rejects that default on two structural axes:

Latency floor. Direct-attached NVMe delivers ~50 μs round-trips vs ~250 μs for EBS — a 5× gap on every IO. See concepts/network-attached-storage-latency-penalty.
Reliability floor. EBS's gp3 SLO guarantees "at least 90% of provisioned IOPS 99% of the time" — 14 min/day of potential degraded operation. Even on io2, at fleet scale, "you'd still be expected to be in a failure condition roughly one third of the time in any given year on just that one database". See concepts/performance-variance-degradation + concepts/blast-radius-multiplier-at-fleet-scale.

The two historical reasons to pay the network-storage penalty — instance-independent durability (volume survives EC2 termination) and elastic capacity resizing (volume modify- in-place) — are replaced:

Durability ← cluster-level replication. Three independent nodes with independent storage collapses single-node failure into a cluster-protocol event. PlanetScale's framing: "with three servers, this goes down to 1% × 1% × 1% = 0.0001% chance (1 in one million)" under 1%/month single-server failure.
Elastic capacity ← automated spin-up + zero-downtime migrate. "We allow you to resize your servers with larger drives with just a few clicks of a button when you run up against storage limits. Behind the scenes, we handle spinning up new nodes and migrating your data from your old instances to the new ones with zero downtime."

Both are wiki-canonical on patterns/direct-attached-nvme-with-replication + patterns/shared-nothing-storage-topology.

Relationship to EBS failure rates¶

Metal is PlanetScale's structural response to the observed fleet-scale EBS failure rate (). The "real failure rate of EBS" post argues that:

At fleet scale, EBS event rate is effectively 100% ("multiple events on a daily basis" across PlanetScale's fleet).
io2 does not fix it ("correlated failure inside of a single zone, even using io2 volumes").
PlanetScale's patterns/automated-volume-health-monitoring
patterns/zero-downtime-reparent-on-degradation clamps the impact window on EBS but cannot eliminate events.
The structural fix is shared-nothing on local NVMe — i.e. Metal.

Durability model¶

PlanetScale's durability argument for Metal, verbatim:

Say in a given month, there is a 1% chance of a server failing. With a single server, this means we have a 1% chance of losing our data each month. Maybe this doesn't sound like a big deal, but over a 10 year period this translates to having a greater than 70% chance of having lost your data!

However, with three servers, this goes down to 1% × 1% × 1% = 0.0001% chance (1 in one million). At PlanetScale the protection is actually far stronger than even this, as we automatically detect and replace failed nodes in your cluster. We take frequent and reliable backups of the data in your database for added protection.

The math is illustrative (independent monthly 1% failure is a placeholder); correlated failure — AZ-wide power event, instance-type retirement, software bug — is not covered by the independent-failure formula. PlanetScale notes they "automatically detect and replace failed nodes" + "take frequent and reliable backups" as the empirical hardening on top of the math.

What Metal is built on¶

Direct-attached NVMe on EC2 storage-optimised instance types (Metal doesn't name the exact types on the disclosed posts; i4, i3en, im4gn are the industry-standard candidates).
Vitess for MySQL clusters (the original PlanetScale substrate).
Postgres as the alternative engine (also on Metal; see PlanetScale system page).
Primary + 2 replicas by default — 3-way cluster replication for durability.
Automated node lifecycle — health monitoring, replacement provisioning, reparent on node loss.

Seen in¶

— Canonical wiki disclosure of the Cluster Metrics four-category memory chart (active cache / inactive cache / RSS / memory-mapped) shown on the PlanetScale dashboard for Metal clusters. Simeon Griggs (PlanetScale, 2026-03-30) documents that the dashboard's single "% memory used" number is a sum over behaviourally-distinct categories, and the stacked-chart view decomposes into reclaimable-cache vs non-reclaimable-RSS. Discloses the ~1000× RAM vs NVMe latency ratio ("reading a page from RAM is roughly 1,000 times faster than reading it from even a fast NVMe drive") — completing the storage-latency hierarchy disclosure for Metal (prior posts documented ~50 μs NVMe vs ~250 μs EBS; this adds the RAM tier one layer up). Reinforces the Metal product thesis that page-cache utilisation is the right place to spend memory: RAM hits avoid the NVMe-IO cost entirely, so aggressive cache utilisation is "the goal, not a side effect."
— First canonical self-reported Metal production-migration datum on a real internal PlanetScale workload. Rafer Hazen (2025-03-11) migrated the Insights backing database — an 8-shard MySQL/Vitess cluster serving "approximately 10k UPDATE/INSERT statements per second" from "32 consumer processes, each with 25 writer threads for a total max concurrency of 800 threads" reading Kafka — from EBS (with provisioned IOPS) to Metal. Picked the busiest of 8 shards first (first canonical wiki instance of the canary- shard substrate migration pattern), soaked "a few days", then rolled the remaining 7 shards to "nearly identical improvement in performance." Latency improvement reported "substantial" at p50 / p90 / p95 / p99 across all four percentile graphs; the previously-worst shard became the fastest after the swap. Downstream effect: "lower average backlog in our Kafka consumers, and has given us additional capacity to handle increasing message volume in the future" (canonicalised as Kafka consumer backlog as downstream-storage back-pressure signal). Load-bearing wiki claim for Metal's substrate- swap positioning: "Without making any changes to our application, architecture, or sharding configuration, we were able to realize substantial performance improvements by upgrading to PlanetScale Metal." First canonical datum that Metal pays for a workload that is I/O-latency-sensitive (concepts/io-latency-sensitive-workload) rather than IOPS-cap-sensitive — PlanetScale had already applied sharding-as-IOPS- scaling (8 shards) and provisioned-IOPS EBS upgrades, and the binding constraint remained per-write latency. Metal fixed what sharding + more IOPS could not. Complements the 2023-08-10 sibling post () which architected the Insights pipeline on EBS; this 2025-03-11 post migrates the same pipeline to Metal 19 months later, closing the Insights substrate-migration loop. Caveat: no absolute latency numbers published (graphs show direction + shape only); no cost delta disclosed.
— Canonical architectural launch-post capstone (Richard Crowley, 2025-03-11). Most accessible single- article framing of Metal's thesis: bundles the latency case (Dicken 2025-03-13), the reliability case (Van Wiggeren 2025-03-18), and the economics + durability math into one piece. Canonical one-sentence substitution statement: "Metal differs from the PlanetScale you already know well in exactly one way: We've substituted Amazon EBS and Google Persistent Disk with the fast, local NVMe drives available from the cloud providers." Canonical instance- type pairing: r6i.4xlarge + EBS = 40,000 IOPS vs i4i.4xlarge local NVMe = 220,000 random write / 400,000 random read IOPS — 5.5-10× IOPS ratio on the same vCPU class. Canonical production-migration datum: million-QPS workload, EBS ~1ms I/O latency → Metal μs-scale, p99 query latency 9ms → 4ms (56% reduction by swapping substrate alone). Canonical durability-math instance for patterns/direct-attached-nvme-with-replication: primary + 2 replicas across 3 AZs, semi-sync MySQL replication, daily tested-restore backups, automated replica replacement; under 1%-monthly-instance-failure + 5min-EBS-reattach + 5hr-backup-restore assumptions (all Crowley's own "unfair to Metal" choices), write- availability loss ≈ 0.000001% and data loss ≈ 0.00000000003%. Canonical IOPS-per-dollar table: r6a + EBS variants span 0.84-13.2 IOPS/$; i4i + local NVMe sits at 58.41-58.50 IOPS/$ uniformly — 13-17× price-performance at 4xlarge scale, plus Reserved-Instance / Savings-Plan discount runway that "Amazon EBS cannot be discounted by". Canonical EBS volume-type price spread: "$80 per TB for the slowest configuration to $2,573 per TB for the highest-performance EBS io2 volumes" — ~32× price spread within EBS alone, which Metal dissolves entirely. Workload-shape guidance for when Metal pays: (a) random reads on workloads bigger than buffer pool; (b) "working sets that don't fit into the InnoDB buffer pool" — Metal can replace a read-replica fleet or memcached; (c) massive write throughput where replicas can't keep up; (d) low-latency-intolerant workloads. Philosophical frame: "Hardware is really good now. The rest is PlanetScale doing everything it takes to let that hardware shine." Metal is the exit from the HDD-era "storage is slow anyway; the network hop is free" assumption.
sources/2026-04-21-planetscale-increase-iops-and-throughput-with-sharding — Metal framed as the substrate-level alternative to horizontal sharding for escaping the EBS IOPS cost cliff. Ben Dicken's 2024-08-19 article was written before Metal launched (March 2025); the re-fetched 2026-04-21 version opens with a retrofitted Note: "Since this article was written, we have released [PlanetScale Metal]. Metal databases give you unlimited IOPS and ultra low latency reads and writes. If you need a database with incredible IO performance, [check out Metal]." Canonical wiki statement that Metal and sharding-as-IOPS- scaling are two architectural answers to the same problem — the original article sold sharding; Metal's local-NVMe substrate dissolves the need for the sharding workaround. Sharding still matters for data-size and write-throughput reasons, but the IOPS-cost-cliff (8× workload → 11-13× cost on RDS+io1) is structurally gone on Metal. Pairs with the 2025-03-13 IO-devices post (Metal as latency fix) and the 2025-03-18 failure-rate post (Metal as reliability fix) to bracket Metal's three-part architectural thesis across latency / reliability / cost.
sources/2026-04-21-planetscale-faster-planetscale-postgres-connections-with-cloudflare-hyperdrive — canonical customer-facing positioning as the "benchmarked fastest cloud Postgres" paired with Cloudflare's edge stack. The smallest Metal cluster ($50/month) is used for the Simeon Griggs 2026-02-19 prediction-market real-time demo; the post reiterates "PlanetScale Metal databases are powered by blazing-fast, locally-attached NVMe SSD drives instead of network-attached storage" as the durability-vs-latency posture that makes Metal the correct authoritative-store pick for the real-time workload. Canonical wiki anchor for the Metal-as-source-of-truth half of patterns/db-authoritative-with-websocket-notify.
— First wiki datum that Metal specifically accelerates SPANN posting-list I/O for vector indexes. The GA post for PlanetScale vectors names Metal as the ideal substrate: "Using vector indexes on PlanetScale Metal ensures that loading vector partitions from InnoDB to answer queries will be as fast as possible." Concrete composition: SPANN's query path reads bounded sets of SSD-resident posting lists per query (see systems/spann); Metal's ~50 μs local-NVMe round-trip (vs ~250 μs EBS) directly compresses the SPANN partition- load latency floor. The ~80/20 SSD/RAM split of the SPANN index (see concepts/larger-than-ram-vector-index) places the majority of index I/O on the SSD substrate Metal optimises.
sources/2025-03-13-planetscale-io-devices-and-latency — Metal announcement / latency argument. Establishes the ~50 μs local vs ~250 μs EBS latency datum, the 3,000-IOPS gp3-default cap vs "no artificial cap" on Metal, and the replication-for-durability argument.
— complementary reliability argument. "This is why we built PlanetScale Metal. With a shared-nothing architecture that uses local storage instead of network- attached storage like EBS, the rest of the shards and nodes in a database are able to continue to operate without problem."
— Metal architecture extended from MySQL to Postgres. PlanetScale's launch announcement for PlanetScale for Postgres brings Metal's direct-attached-NVMe cluster shape to Postgres: "PlanetScale Metal's locally-attached NVMe SSD drives fundamentally change the performance/cost ratio for hosting relational databases in the cloud. We're excited to bring this performance to Postgres." Metal-for-Postgres inherits the same primary + 2 replicas
direct-attached-NVMe topology + no IOPS cap as Metal-for-MySQL. First wiki datum that Metal is engine-agnostic at the cluster-shape layer — the local-NVMe + replication architecture is a substrate under both engines, not a MySQL-specific choice.
— First benchmark-measured Metal-adjacent datum on the wiki. Ben Dicken's Postgres 17 vs 18 benchmarks use an i7i.2xlarge with 1.8 TB local NVMe as the canonical direct-attached-NVMe-on-EC2 instance. The i7i wins every scenario tested — every Postgres config × concurrency × range-size combination on sysbench oltp_read_only. Even Postgres 18's new async-I/O modes (worker, io_uring) fail to close the gap from EBS to local NVMe. Price-performance: the i7i at $551.15/mo beats r7i+io2-16k at $1,513.82/mo and narrowly beats r7i+gp3-10k at $492.32/mo on a per-QPS basis. While PlanetScale doesn't benchmark Metal itself (the post uses bare EC2 instances to avoid the "vendor benchmarks own product" framing), the i7i data is the closest vendor-neutral proxy for what Metal delivers on a primary + 2 replicas topology. Empirical backing for the "we built PlanetScale Metal" structural argument; the local-NVMe instance type that PlanetScale uses under Metal is what wins this benchmark.
sources/2026-04-21-planetscale-benchmarking-postgres — Names Metal's on-AWS reference shape. PlanetScale's multi-vendor Postgres benchmark methodology post confirms that PlanetScale for Postgres runs on i8g M-320 (4 vCPU, 32 GB RAM, 937 GB NVMe SSD) with a primary + 2 replicas across 3 AZs. This is the Metal-on-Postgres-on-AWS shape, the direct successor to the 2025-10-14 post's i7i.2xlarge proxy datum. All benchmarked competitors (Aurora, AlloyDB, CrunchyData, Supabase, TigerData, Neon) use network-attached storage — Metal is the only local- NVMe point in the comparison. The post's availability-posture-equalisation rule ("each would also need to add replicas") is the accounting corollary of Metal's default 3-AZ topology. The Telescope harness publishes these benchmarks per the new patterns/reproducible-benchmark-publication pattern, and price-performance becomes the fourth benchmark dimension Metal's architecture is engineered to dominate.

Caveats¶

No disclosed Metal production benchmarks. The announcement-window posts don't publish QPS / p99 / IOPS numbers for Metal against EBS-backed Aurora or RDS. The latency figures (~50 μs) are illustrative teaching numbers, not Metal measurements. The failure-rate figures are EBS- side arithmetic under stated assumptions, not Metal-side measurements.
Metal's own failure modes not fully described. Local- NVMe drive failure rates, instance termination handling, noisy-neighbour behaviour on shared EC2 hardware, and cross-replica consistency during reparent on Metal are not architecturally detailed in the announcement posts.
Shared-nothing envelope still has an AZ boundary. An AZ-wide power event takes out all three nodes if they are co-located in one AZ. Metal's AZ / cross-AZ / cross- region replication topology isn't spelled out.
Resize migration is not instant. "with zero downtime" but migration-to-new-instance takes real time; during that window cluster is in a transient-topology state.
Direct-attached NVMe implementations on the public cloud are EC2 storage-optimised instance types (such as i3en, i4i, im4gn) — PlanetScale doesn't name them in the disclosed posts.
Metal competes with itself. Some workloads that don't stress EBS (cached read-heavy, small-volume OLTP) don't see enough benefit to justify the pricing delta; the post doesn't quantify the price gap.

When-to-shard implication¶

positions Metal as a substrate-level shift in the write-throughput ceiling: "With PlanetScale Metal, write throughput is significantly less of a concern. Unlike other solutions such as RDS and CloudSQL that separate storage and compute, Metal keeps them together on the same hardware. This reduces network hops and provides better hardware, delivering substantially higher IOPS. If you're on Metal, you can often delay sharding and continue scaling vertically much further (into the several TB range) than traditional cloud database architectures allow." In scaling-ladder terms, Metal extends rung 1 (vertical scaling) further before rung 4 (horizontal sharding) becomes necessary.