Skip to content

PLANETSCALE 2023-06-28

Read original ↗

PlanetScale — How PlanetScale keeps your data safe

Summary

Sam Lambert (PlanetScale CEO, 2023-06-28) writes a marketing-flavoured multi-layered-defence overview of PlanetScale's data-safety story — not a mechanism deep-dive but a canonical enumeration of the seven layers PlanetScale counts as its data-safety envelope: (1) Vitess as the clustering substrate, (2) MySQL's ACID transactional guarantees, (3) MySQL semi-synchronous replication for replica-ack-gated commits, (4) cloud block storage (AWS EBS / Google Cloud Persistent Disk) for multi-drive redundancy and self-healing, (5) safe migrations + Revert as the "undeploy-a-schema-change-without-data-loss" lever, (6) mandatory validated backups on every plan, (7) TLS-only connections + encrypted storage + auto-invalidation of leaked credentials + storage-engine-maturity-as-risk-floor.

The post's load-bearing contribution to the wiki is NOT new mechanism (every one of the seven layers has a deeper dedicated post elsewhere in the PlanetScale corpus — Noach's semi-sync piece, Morrison II's automated-backup-validation piece, Barnett's schema-revert launch post, Crowley's Metal-on-NVMe semi-sync-durability framing). What it does contribute is (a) one-article enumeration of the seven layers as a single defensive envelope — useful as a canonical "data-safety story table of contents" reference; (b) the novel GitHub-leaked-credential auto-invalidation primitive ("If you accidentally push a PlanetScale database credential into a public GitHub repository, it will be automatically invalidated within seconds to prevent unwanted data access") which is a previously-uncaptured safety primitive on the wiki; (c) a verbatim statement of Vitess production scale"Vitess clusters will have served 10s of millions of users and 100s of millions of queries across 100s of petabytes of data" in the reading-time window of this blog post; (d) the storage-engine-maturity-as-data-risk framing — "If you are trusting a storage engine that has been around for less than a decade, you are taking extreme risk with your most important asset: your data" — a 28-years-of-MySQL / 10+-years-of-Vitess argument against NewSQL / novel-storage-engine database platforms; (e) naming Slack, Hubspot, Etsy as Vitess's primary-datastore flagship users in one sentence. Architecture density ~30% (heavily marketing-inflected, Sam-Lambert-CEO-voice) but clears the Tier-3 bar via (b) and (d) — the GitHub-credential-invalidation primitive and the storage-engine-maturity argument are genuinely novel to the wiki.

Key takeaways

  • Seven layers of data safety, enumerated as a single envelope. Lambert lists the defensive layers in order of depth: (1) Vitess clustering substrate ("Whenever you create a database on PlanetScale you are actually creating a complete Vitess cluster"); (2) MySQL ACID ("MySQL is well-known for its support of ACID … transactions are serializable and predictable. This means that even if the database is interrupted by a system failure or network issue, transactions are either executed in full or not at all"); (3) Semi-synchronous replication ("MySQL's semi-synchronous replication further enhances data durability by ensuring that transactions are replicated to multiple servers. Semi-synchronous replication is a mode of replication in which the master waits until at least one replica acknowledges receipt of the transaction before moving on to the next one"); (4) Cloud block storage (EBS / GCPD) ("mount the MySQL data volume on cloud block storage, such as Amazon Web Services (AWS) Elastic Block Store (EBS) and Google Cloud Persistent Disk (GCPD), which are designed to be highly durable and reliable. Both EBS and GCPD use data replication to ensure that data is stored redundantly across multiple drives"); (5) Safe migrations + Revert ("safe migrations, which protects against potentially destructive actions such as accidentally dropping a column or table. Safe migrations forces all schema changes to go through a deploy request, which is auditable, rate limited, and, most importantly, revertable. If you drop the wrong column or table, Revert allows you to instantly undeploy a schema change without any data loss"); (6) Mandatory validated backups ("All PlanetScale databases have a mandatory backup schedule included with every database plan at no additional cost. … each new mandatory backup restores from a previous backup to validate that it was taken properly and ensure that there is always at least one healthy backup before your database's binary logs are rotated out"); (7) Security ("All PlanetScale databases are encrypted at rest and in transit. It is impossible to connect to a PlanetScale database without an SSL certificate … If you accidentally push a PlanetScale database credential into a public GitHub repository, it will be automatically invalidated within seconds"). Canonical wiki framing: seven defensive layers = seven independent failure modes each requiring independent attack to cause data loss — a defence-in-depth posture where no single layer is load-bearing for the durability guarantee.

  • "It is impossible to connect to a PlanetScale database without an SSL certificate." Verbatim statement of SSL/TLS enforcement as a platform invariant: no unencrypted client connections are admitted. "It is impossible to connect to a PlanetScale database without an SSL certificate and we ensure all credentials are generated by PlanetScale to guarantee they meet the strictest complexity requirements." Canonical wiki concept: SSL enforcement as a platform-level invariant rather than a per-database configuration knob. Paired with the TLS-as-per-request-tax conversation captured at concepts/ssl-handshake-as-per-request-tax — the enforcement is what creates the tax that the HTTP-multiplexed serverless driver was built to amortise. The post also specifies platform-generated credentials with strictest complexity requirements — i.e., the operator never types their own password; PlanetScale issues credentials, removing the weak-password failure mode.

  • Leaked-credential auto-invalidation via GitHub secret scanning. Verbatim: "If you accidentally push a PlanetScale database credential into a public GitHub repository, it will be automatically invalidated within seconds to prevent unwanted data access." Canonical wiki primitive: leaked-credential auto-invalidation — the operator-side safety lever where the database platform subscribes to GitHub's secret-scanning notifications (or a similar substrate) and automatically revokes any credential that appears in a public commit. Latency budget: "within seconds" — i.e., before an automated scanner scraping public GitHub activity has had time to exploit the leak. This composes with the PlanetScale-generated-credentials property: since all credentials follow a predictable pattern, GitHub's secret-scanning can recognise them cleanly without false positives, and PlanetScale can revoke with confidence that nothing else is using that credential format. The within-seconds SLO is load-bearing because GitHub push → public-scan → credential-harvest is typically in the 1-5 minute window for high-volume scanners, so revocation must beat that race.

  • Safe migrations as mandatory deploy-request + Revert undo. Lambert frames Schema revert (canonicalised in Barnett's 2022 launch post and Guevara+Noach's 2022 internals post) as a data-safety primitive"If you drop the wrong column or table, Revert allows you to instantly undeploy a schema change without any data loss. This turns multi-hour outages into a couple of seconds." The multi-hour outages → couple of seconds comparison is the canonical customer-impact framing: a destructive DDL on a traditional database means restore-from-backup (hours) or pg_dump-re-ingest (hours); Revert makes it seconds because the former production table was kept alive as a ghost with inverse replication. See patterns/instant-schema-revert-via-inverse-replication for the mechanism page. Lambert's safe-migrations framing adds three properties to the deploy-request primitive: auditable (tracked change record), rate limited (can't accidentally batch-destroy), and revertable (the Revert primitive). The rate-limited-ness is the novel datum — previous wiki framings of deploy requests emphasise auditability + revert, not rate limiting.

  • Mandatory backup schedule + restore-replay validation on every plan. Verbatim: "All PlanetScale databases have a mandatory backup schedule included with every database plan at no additional cost. Backups are essential safeguards against application bugs that delete data and can go undetected for a long time. To ensure our backups are valid, each new mandatory backup restores from a previous backup to validate that it was taken properly and ensure that there is always at least one healthy backup before your database's binary logs are rotated out." Canonical wiki composition: mandatory-not-optional + every-plan-not-premium + no-additional-cost + validated-via-restore-replay + binlog-rotation-gated — five orthogonal properties. The binlog-rotation-gated property is subtle and important: binary logs can only be rotated (deleted from disk) once there is a validated backup that includes the data they describe — so the backup validation pipeline is a binlog-retention gate, preventing the operational footgun where binlogs rotate out before the next backup has been validated. See concepts/automated-backup-validation for the canonical pipeline description. Lambert's framing adds the "application bugs that delete data and can go undetected for a long time" motivation — i.e., backups defend against slow-bug-induced data loss where the operator doesn't notice the bug for days or weeks, not just fast-incident recovery.

  • Cloud block storage (EBS + GCPD) as self-healing multi-drive-replicated durability. Verbatim: "EBS and GCPD use data replication to ensure that data is stored redundantly across multiple drives, which helps to reduce the risk of data loss due to hardware failures or other issues. In addition, both EBS and GCPD are designed to be self-healing, meaning they can detect and repair data inconsistencies automatically without user intervention." Canonical wiki framing: cloud block storage as substrate durability primitive — the PlanetScale post treats EBS / GCPD as a trusted durable-storage abstraction whose multi-drive replication is the fourth defensive layer. The self-healing property is the novel framing: not just replication, but automatic inconsistency detection + repair — the storage layer itself runs scrub/repair cycles to keep the replicated copies mutually consistent, so the database layer above never sees bit-flip or partial-write-induced corruption. Paired with the canonical Dicken 2025-03-13 I/O devices post and Van Wiggeren 2025-03-18 EBS real failure-rate post — the 2025-era PlanetScale posts by Dicken and Van Wiggeren invert this 2023-era Lambert framing, arguing EBS is less reliable than Lambert claims and that local-NVMe-with-replication (PlanetScale Metal) is the better substrate. Contradiction acknowledged: the 2023 Lambert post frames EBS/GCPD as trustworthy; the 2025 Dicken/Van Wiggeren/Crowley posts rework that framing to justify the Metal tier on local NVMe + application-layer replication. Both positions coexist — the 2023 post's claim (EBS is durable enough) remains true for the default tier; the 2025 posts establish that a higher-performance, higher-reliability tier needs something different.

  • Semi-sync replication framed as "master waits until at least one replica acknowledges". Verbatim: "Semi-synchronous replication is a mode of replication in which the master waits until at least one replica acknowledges receipt of the transaction before moving on to the next one. This feature ensures that in case of a primary node failure, the replica that has received the transaction is up-to-date and can be promoted as the new primary node without data loss." Canonical framing: semi-sync not as a mechanism but as a failover-preserving-durability primitive — the replica-ack guarantees that when the primary fails, at least one replica has the transaction and can be promoted losslessly. Pairs with the Englander 2025-07-03 extreme fault tolerance post which frames semi-sync explicitly as the substrate underneath weekly-failover-as-routine ("Commits stored durably on at least one replica before primary sends acknowledgment to the client. Enables us to treat replicas as potential primaries, and fail over to them immediately as needed") and the Noach semi-sync deep-dive for the full mechanism. Lambert's framing is the shortest-form marketing statement of the semi-sync contract — the load-bearing clause is "without data loss", which aligns with Noach's "durability, not consistency" framing.

  • Vitess production-scale datum: "10s of millions of users, 100s of millions of queries, 100s of petabytes of data." Verbatim: "In the time it takes you to read this blog post, Vitess clusters will have served 10s of millions of users and 100s of millions of queries across 100s of petabytes of data." Canonical wiki datum: 100s of petabytes of Vitess-managed data in aggregate across Slack, Hubspot, Etsy and peers, reported in 2023-06-28. Reading-time framing (~5 minutes for this post) implies 100s of millions of queries / 5 min ≈ high hundreds of kQPS aggregate across all Vitess clusters in that window — sub-million QPS aggregate if taken literally, but the post uses the reading-time framing as marketing rhetoric rather than a precise benchmark. The more precise 2023-era Vitess datum is JD.com's 35M QPS Singles Day peak (see sources/2026-04-21-planetscale-horizontal-sharding-for-mysql-made-easy). Slack, Hubspot, Etsy named as "primary datastore" users — i.e., load-bearing for those companies, not a peripheral cache.

  • Storage-engine-maturity-as-data-risk argument. Lambert's closing framing: "MySQL has been serving mission-critical applications at web scale for 28 years. Layering on Vitess, which has served some of the largest sites on the planet for over a decade, you know that every code path has been battle hardened. Database storage engines take a long time to get right. If you are trusting a storage engine that has been around for less than a decade, you are taking extreme risk with your most important asset: your data." Canonical wiki concept: storage-engine maturity as data risk — the framing that years-of-production-exposure is the substrate durability metric for a database platform, independent of its advertised feature set. The 28-years-for-MySQL + 10-years-for-Vitess aggregates to a 38-year-code-path-age argument. The "less than a decade" cutoff is the implicit positioning shot at NewSQL peers (Spanner, CockroachDB, TiDB, Yugabyte) and Aurora-era cloud-native MySQL — all of which by 2023 are under or near the decade mark. This framing complements concepts/sharded-failure-domain-isolation and concepts/blast-radius from the 2023-era Three surprising benefits of sharding post — sharding limits how much data a bug affects; storage-engine-maturity limits how likely a data-corrupting bug is in the first place.

  • Encryption at rest + in transit as a matched pair. Lambert lists these together: "All PlanetScale databases are encrypted at rest and in transit." The in-transit side is the TLS enforcement; the at-rest side is both primary-storage encryption (EBS / GCPD encryption) and backup encryption (see concepts/backup-encryption-at-rest for the 2024-07-30 Dicken-post canonicalisation). Lambert's 2023 framing is the matched-pair property — no unencrypted state exists anywhere in the database envelope, whether flowing between client and server or resting on disk or in object storage. This aligns with compliance frameworks (SOC 2, HIPAA, PCI) that require end-to-end encryption.

Operational numbers

  • 28 years of MySQL production — MySQL's age at 2023-06-28 (first public release 1995-05-23). Lambert's "serving mission-critical applications at web scale for 28 years".
  • 10+ years of Vitess production — Vitess's age at 2023-06-28 (YouTube origin 2010). Lambert's "has served some of the largest sites on the planet for over a decade".
  • 100s of petabytes — aggregate Vitess-managed data across Slack / Hubspot / Etsy and peers, as reported 2023-06-28.
  • 10s of millions of users, 100s of millions of queries — aggregate Vitess per-reading-time-window throughput metric (~5 min reading time).
  • Seconds latency SLO on credential auto-invalidation — "automatically invalidated within seconds to prevent unwanted data access".
  • Seconds latency SLO on schema revert — "This turns multi-hour outages into a couple of seconds."
  • Less than a decade — Lambert's threshold cutoff for "extreme risk" storage-engine adoption.
  • 3 named Vitess flagship users: Slack, Hubspot, Etsy (all as "primary datastore").
  • 2 named cloud block-storage substrates: AWS EBS, Google Cloud Persistent Disk (GCPD).
  • 7 defensive layers: Vitess / MySQL ACID / semi-sync / block storage / safe-migrations + Revert / mandatory validated backups / security (TLS + encryption + credential auto-invalidation).

Caveats

  • Marketing-heavy, Sam-Lambert-CEO voice. The post is a positioning piece — it says what the data-safety envelope is, not how each layer is mechanised. Every single layer has a deeper dedicated post in the PlanetScale corpus (Noach on semi-sync, Morrison II on backup validation, Barnett on schema revert, Crowley on Metal-on-NVMe, Dicken on EBS reliability). This post's value to the wiki is the one-article enumeration of the seven layers as a single envelope, plus the novel GitHub-credential-auto-invalidation primitive and the storage-engine-maturity argument — not mechanism depth. Architecture density ~30%; clears Tier-3 via (1) canonical enumeration, (2) new credential-invalidation primitive, (3) storage-engine-maturity framing. The remaining ~70% is trust-building marketing prose ("At PlanetScale we take data safety extremely seriously") and ceo-voice framing. Per AGENTS.md borderline-cases rule, the 30% architecture content is just above the 20% threshold — include.

  • 2023-era EBS-as-durable framing contradicts 2025-era Dicken/Van Wiggeren/Crowley reframing. Lambert's 2023 post frames EBS as "designed to be highly durable and reliable" and "self-healing" — a trust-the-substrate framing. The 2025-era PlanetScale posts invert this framing to argue for local-NVMe-with-application-layer-replication (PlanetScale Metal). Not a literal contradiction — both claims can be true in different operational tiers — but a framing shift worth naming. The 2023 post's frame remains accurate for the default PlanetScale tier on EBS; the 2025 posts establish that a higher tier requires a different substrate. See sources/2025-03-13-planetscale-io-devices-and-latency, sources/2025-03-18-planetscale-the-real-failure-rate-of-ebs, systems/planetscale-metal.

  • "It is impossible to connect without an SSL certificate" is a mutual-TLS-adjacent claim but not MTLS per se. Lambert's framing says the client must present a certificate, but it doesn't disclose whether this is client-cert MTLS or server-cert-plus-username-password-over-TLS. Subsequent PlanetScale posts (the 2024 HTTP/3 post, the HTTP-API serverless driver posts) make clear that the server-cert + MySQL auth over TLS is the standard path — not client-certificate MTLS. The "cannot connect without an SSL certificate" phrase reads as client validates server certificate rather than client presents its own certificate. Noted as a framing ambiguity.

  • Credential auto-invalidation mechanism not disclosed. Lambert claims credentials are auto-invalidated "within seconds" when pushed to a public GitHub repo, but doesn't disclose the mechanism — is this GitHub's secret-scanning partner program? A PlanetScale-operated scraper? Both? The wiki's canonical framing at concepts/leaked-credential-auto-invalidation treats this as the substrate-agnostic primitive — a platform that owns the credential namespace and subscribes to credential-leak-detection substrates (whether GitHub's partner program or something else) can revoke leaked credentials within the scanner-race window. The exact mechanism is below the abstraction layer of this post.

  • No disclosure of the backup retention window or cadence. Lambert says the backup schedule is mandatory and no-extra-cost but doesn't disclose the default retention (14 days? 30 days? plan-dependent?) or cadence (hourly? daily?). The validation mechanism is stated (restore-replay-before-binlog-rotation) but the SLAs around retention aren't. The sibling sources/2026-04-21-planetscale-faster-backups-with-sharding|2024-07-30 Dicken faster-backups-with-sharding post discloses the shard-parallel backup mechanism but also doesn't disclose retention policy.

Source

Last updated · 550 distilled / 1,221 read