PLANETSCALE 2023-08-31

PlanetScale — Horizontal sharding for MySQL made easy¶

Summary¶

Lucy Burns and Taylor Barnett (PlanetScale, original 2020-10-22, updated 2023-08-31) write a pedagogical positioning piece that walks the "why horizontal sharding, why Vitess, why PlanetScale" narrative at overview altitude — no code, no operator-sequence walkthrough, no production war story. The post's load-bearing contribution to the wiki is not new mechanism (that territory is covered by Dicken's Dealing with large tables and Berquist's Guide to scaling your database) but canonical naming of the pre-Vitess era via two historical exemplars — Pinterest's 2012-era application-level sharding with Marty Weiner's "We had several NoSQL technologies, all of which eventually broke catastrophically" quote, and Etsy's two-way primary-key + shard-packing approach — and a graphical cost-curve framing that canonicalises vertical scaling as a step-function of discrete hardware leaps vs horizontal sharding as a linear cost with data volume. The post also supplies three previously-uncaptured production-scale numbers for Vitess: JD.com's 35 million QPS during Singles Day, Slack's full Vitess migration surviving the WFH 2020 traffic influx, and Square's Cash-app Vitess adoption.

The post is useful for three reasons: (1) it canonicalises patterns/application-level-sharding as a named legacy pattern (the wiki has treated it implicitly across Figma / Slack / Canva posts but not given it its own pattern page); (2) it names concepts/vertical-scaling-cost-step-function — the "you will end up paying for resources that you don't have an immediate need for just to prepare for anticipated spikes in traffic. Once you outgrow your current machine, the next you invest in could easily be 50% larger, while you really only end up using 10% of it" discontinuity — as a primitive distinct from generic vertical scaling; (3) it adds the JD.com / Slack / Square production-scale triad to the Vitess case-study ledger, pairing the canonical YouTube-origin story with three named non-PlanetScale production users at webscale. Architecture density ~40% (the piece is explicitly pedagogical / positioning — half the body is "why sharding at all" + "why not application-level" + "why Vitess" + "why PlanetScale"). A Tier-3 clear because the historical + production-numbers contributions are genuinely novel to the wiki.

Key takeaways¶

Application-level sharding was the pre-Vitess norm. Verbatim: "Some companies have implemented horizontal sharding at the application level. In this approach, all of the logic for routing queries to the correct database server lives in the application. This requires additional logic at the application level, which must be updated whenever a new feature is added. It also means that the application needs to implement cross shard features. Additionally, as data grows and the initial set of shards runs out of capacity, 'resharding' or increasing the number of shards while continuing to serve traffic becomes a daunting operational challenge." Canonical wiki pattern: patterns/application-level-sharding — the pre-2010-era default that Pinterest (2012), Etsy, and many others shipped before Vitess matured. The post's specific failure-mode list: (1) application-layer routing logic must be updated for every new feature, (2) cross-shard features (joins, aggregations, transactions) must be reimplemented in the app, (3) resharding while serving traffic is a daunting operational challenge — the compounding of these three is what motivated the shift to a sharding-substrate layer below the app.
Pinterest / Etsy canonicalised as the application-level-sharding precedents. Pinterest's 2012-era sharding post (Marty Weiner) is quoted: "We had several NoSQL technologies, all of which eventually broke catastrophically" — naming the NoSQL-immaturity-era constraint that drove Pinterest back to sharded MySQL. "Pinterest mapped their data by primary key and used it to map data to the shard where it resided. Sharding in this way provided scale but traded off cross shard joins and the use of foreign keys." Etsy: "Etsy took this approach when moving to a sharded database system but added a two-way lookup primary key to the shard_id and packed shards onto hosts, automating some of the work of managing shards." Canonical historical framing: both companies later migrated some workloads to Vitess (explicitly named in the post) — the application-level approach was self-acknowledged as too operationally brittle at the scale they reached. "In both cases, however, ongoing management of shards, including splitting shards after the initial resharding, presented significant challenges."
Vertical scaling is a step-function cost curve; horizontal scaling is linear. Verbatim: "It's inefficient to make big leaps in hardware to overprovision for potential spikes in traffic or use. With this method, you will end up paying for resources that you don't have an immediate need for just to prepare for anticipated spikes in traffic. Once you outgrow your current machine, the next you invest in could easily be 50% larger, while you really only end up using 10% of it." The post includes an explicit graph: "A graph of comparative cost of horizontal versus vertical database scaling over time. A step-function that illustrates the unpredictable and often substantial cost associated with vertical scaling. … A linear increasing line that shows the incremental and predictable cost of adding more servers or shards as data volume increases." Canonical wiki concept: concepts/vertical-scaling-cost-step-function — the overprovisioning discontinuity where the operator must pay for unused headroom to buy a capacity runway. Worked example from the post: next-tier machine is 50% larger but the operator only uses 10% of it for the foreseeable future — i.e. the paid-but-unused ratio at tier-boundary-crossing is often >4×. Horizontal scaling escapes this discontinuity because the unit of capacity addition (one shard) is small relative to the total fleet — adding a shard buys ~1/N of additional capacity, not a doubling.
Vitess emerged at YouTube in 2010. Verbatim: "Engineers at YouTube began building out the open source project Vitess in 2010. Vitess sits between the application and MySQL databases, allowing horizontally sharded databases to appear monolithic to the application. In addition to removing the complexity of query routing from the application, Vitess provides master failover and backup solutions that remove the operational complexity of a sharded system, as well as features like connection pooling and query rewriting for improved performance." Canonical wiki datum: 2010 as the Vitess origin year — supplies the missing date for the Vitess page's YouTube-origin framing. The post's four-feature Vitess scope-description: (1) query routing below the app, (2) master failover, (3) backup, (4) connection pooling + query rewriting — complements the deeper treatments in later posts (Dicken's consensus series, Morrison's What makes up a PlanetScale Vitess database).
JD.com: 35 million QPS on Singles Day via Vitess. Verbatim: "JD.com, one of the largest online retailers in China, saw 35 million queries per second (QPS) run through Vitess during a peak in traffic on Singles Day." Canonical wiki datum: 35M QPS is the largest Vitess production number disclosed in wiki sources to date — previous Vitess posts cited PlanetScale's own multi-petabyte scale but no single-peak QPS number at this magnitude. Singles Day (11 November, China's single largest retail-traffic day) is the canonical stress-test event for Chinese e-commerce infrastructure.
Slack: full Vitess migration before WFH 2020 absorbed the pandemic traffic spike. Verbatim: "Slack has migrated all their databases to Vitess, surviving the massive influx of traffic from the transition to work from home in 2020." Canonical wiki datum: the 2020 WFH traffic transition is named as a Vitess production stress test. Pairs with the existing Unified Grid post which describes the post-Vitess-migration shard-key re-axis of messages from workspace → channel. This Burns/Barnett post names the earlier Vitess-as-resilience-substrate milestone that made Unified Grid tractable.
Square Cash app on Vitess, separately documented. The post links Square's Sharding Cash engineering blog as a fourth named Vitess production user alongside Slack, JD.com, and Hubspot. Cash-app specifically listed as sharded via Vitess — adding financial-payments workloads to the Vitess production-user taxonomy (previously represented at webscale but not in the regulated-payments tier).
PlanetScale is the only MySQL-compatible managed Vitess platform. Verbatim: "However, running Vitess at scale still requires a whole engineering team with the right experience. Not all organizations have the depth that Slack and Square do. Because of this, PlanetScale democratizes many Vitess features and capabilities, including horizontal sharding, online schema migrations, and more. With PlanetScale, you can unlock all of the power of Vitess in a much shorter time and without all of the required expertise, risk, and potential errors that come with running it yourself. PlanetScale is the only MySQL-compatible database platform built on top of Vitess." Canonical wiki positioning: PlanetScale's value-prop is framed as "Vitess without the team required to run Vitess" — the managed-control-plane + paved-road-feature-set that collapses the operational-expertise prerequisite. The Slack + Square comparison is deliberate: Slack and Square are named as the reference-class organisations that do have the depth to self-host Vitess, i.e. the cutoff the managed offering is positioned against.
Borderline-managed-service claim: "We would rather PlanetScale manage them." The post quotes MyFitnessPal verbatim: "We wanted PlanetScale and Vitess to bring to MyFitnessPal what Kubernetes brought to application delivery and deployment. Databases are hard. We would rather PlanetScale manage them." Canonical wiki framing: the Kubernetes-for-databases analogy — operator-control-plane + paved-road + abstract-away-the-stateful-orchestration-problem — is named as PlanetScale's positioning by its own customers. The MyFitnessPal quote supplies an outside-in customer framing of the build-vs-buy calculus (see also the Figma 2022-era in-house-sharding post for the opposite outside-in framing: "build in-house because the deadline was aggressive").
Sharding complexity cost — why separate sharding logic from the application. Verbatim: "From experiences like these, there is an increasing need to separate sharding logic from the application as it introduces a plethora of complexity, making the application and your database harder to manage, which, in turn, drains developer capacity and pulls your team away from building and improving on great products for your customer base." Canonical wiki discipline: sharding logic in the application = developer-capacity tax that cannot be amortised. The argument for substrate-layer sharding is framed as a team-bandwidth-reallocation argument rather than a pure-performance argument — engineering capacity freed from sharding maintenance is capacity redeployable to product work. This is the same framing Figma later inverts when explaining why they did build in-house: runway pressure + deep operational expertise + relational-model mismatch with mature NewSQL options at the time.

Operational numbers¶

35 million QPS — JD.com Vitess peak during Singles Day.
50% larger / 10% used — the post's canonical illustration of vertical-scaling overprovisioning discontinuity: next-tier machine is 50% larger but only 10% of it is immediately used, i.e. >4× paid-but-unused ratio at tier-boundary-crossing.
2010 — YouTube began building Vitess.
2020 — Slack's Vitess migration survived the WFH traffic influx.
October 22, 2020 — post originally published; updated 2023-08-31.
4 Vitess features named at scope-description depth: query routing, master failover, backup, connection pooling + query rewriting.
4 named Vitess production users: Slack (full migration), Square (Cash app), JD.com (35M QPS), Hubspot (listed without detail). Etsy + Pinterest also migrated some workloads.

Caveats¶

Pedagogical / positioning post, not a mechanism post. The piece explicitly does not cover shard-key selection, resharding operations, cross-shard queries, or scatter-gather trade-offs at depth — those are owned by sibling PlanetScale posts (Dealing with large tables, Guide to scaling your database, Consistent lookup vindex, Three surprising benefits of sharding). This post's contribution is historical framing + cost-curve intuition + production-user-list, not operator-playbook depth. Tier-3 clear via (1) the canonical application-level-sharding pattern naming, (2) the novel JD.com / Slack-2020 / Square production numbers, (3) the vertical-scaling step-function cost-curve framing not previously canonicalised.
Marketing-flavoured close. The final two paragraphs are a PlanetScale product pitch ("you'll design a sharding scheme on top of your existing PlanetScale database; there is no need to change databases just because you need to scale"). Scope-filter borderline: substantive enough to include because the pedagogical + historical + production-number content in the first ~70% of the body is genuinely wiki-worthy, and the product pitch is at the end rather than throughout. Per AGENTS.md borderline-cases rule ("Only skip if architecture content is <20% of the body"), the architecture content here is ~40%, clearing the bar.
"Updated with updated information" footnote. The post explicitly notes republication from 2020-10-22 with a 2023-08-31 update. The 2023 update is what introduced the Slack-2020 + JD.com + Square + Deepthi Sigireddi-tech-talk content; the 2020 original likely had only the Pinterest + Etsy + YouTube-origin narrative. The production-user numbers and the "PlanetScale is the only managed Vitess" claim are 2023-vintage, not 2020.
No mechanism depth on "how Vitess removes the complexity". The post gestures at Vitess features (query routing, failover, backup, pooling) without explaining the vtgate / vttablet / vschema primitive split that makes these possible. For that depth, see sources/2026-04-21-planetscale-what-makes-up-a-planetscale-vitess-database (Morrison's three-Vitess-primitives pedagogy) and systems/vitess.

Source¶

systems/vitess — origin at YouTube 2010, the sharding substrate below PlanetScale MySQL.
systems/planetscale — managed Vitess, positioned as "Vitess without the team required to run Vitess".
systems/mysql — the underlying storage engine Vitess shards.
concepts/horizontal-sharding — the parent concept.
concepts/vertical-scaling — the rung below on the scaling ladder.
concepts/vertical-scaling-cost-step-function — the canonical cost-discontinuity framing introduced by this post.
patterns/application-level-sharding — the pre-Vitess legacy pattern (Pinterest / Etsy).
sources/2026-04-21-planetscale-dealing-with-large-tables — operator-playbook depth on horizontal sharding with vtctldclient Reshard walkthrough.
sources/2026-04-21-planetscale-guide-to-scaling-your-database-when-to-shard-mysql-and-postgres — decision-framework companion.
sources/2026-04-21-planetscale-three-surprising-benefits-of-sharding-a-mysql-database — the four-axis sharding-benefits framing.
sources/2024-08-26-slack-unified-grid-how-we-re-architected-slack-for-our-largest-customers — the Slack Vitess-migration follow-up that names the shard-key re-axis.
companies/planetscale — Tier-3 company page.