PlanetScale — Self-managed Vitess vs Managed Vitess with PlanetScale¶
Summary¶
A PlanetScale marketing comparison (Holly Guevara, 2024-05-24) arguing that running Vitess yourself vs using PlanetScale is a classic build-vs-buy decision dominated by team-sizing and ongoing operational burden, not raw infrastructure cost. The substantive architectural content is concentrated in two places: concrete operational numbers (team sizing per workload band, Vitess release cadence) that the rest of the wiki corpus does not capture elsewhere, and compressed retellings of two canonical self-run-Vitess war stories — Slack's 3-year, 0→2.3M QPS migration and Square Cash's sharding-under-growth odyssey. Both are referenced rather than re-explained, so this source page's value is the numbers and the framing, not new architecture.
Key takeaways¶
- Self-run Vitess needs a dedicated DBA team whose size scales with workload, not with Vitess features. For smaller/mid-sized applications running Vitess (200GB–1TB), PlanetScale reports "teams of around 3-6 DBAs focused on day-to-day database operations. Plus more (closer to 10) for the initial implementation." For "much larger application (1+ TB), you're looking at at least 10 people at all times." These numbers anchor concepts/self-managed-vs-managed-vitess-cost. (Source: this article.)
- Vitess ships 2–3 major versions per year. "The Vitess team typically releases 2-3 major versions per year. This means it's easy to fall behind, especially when you have a smaller team managing your Vitess instances. We frequently onboard customers that are 3+ versions behind who just didn't have the time to keep up with testing and implementing the upgrades." This pins a concrete release cadence on Vitess that the rest of the wiki corpus does not quantify — see concepts/vitess-release-cadence. (Source: this article.)
- MySQL compatibility prep is a non-trivial blocker, not a checkbox. "You can take a look through the MySQL compatibility documentation to get an idea of what kind of prep work you may need to do. For example, we commonly see stored procedures and CTEs as blockers." Teams must refactor application code before migration can start — this is the pre-migration tax captured at concepts/mysql-compatibility-gap. (Source: this article.)
- Slack's Vitess journey took 3 years (July 2017 → late 2020) and carried Slack from 0 QPS to 2.3M QPS on Vitess. "There are many other stories to tell in these 3 years of migrations. Going from 0% to 99% adoption also meant going from 0 QPS to the 2.3 M QPS we serve today." The migration shape was proof-of-concept-first: "We decided to build a prototype demonstrating that we can migrate data from our traditional architecture to Vitess" — a small RSS-feed-into-Slack-channel feature, not a critical service. This anchors patterns/proof-of-concept-before-full-migration as the canonical Vitess-adoption playbook. The PoC required reworking "operational processes for provisioning deployments, service discovery, backup/restore, topology management, credentials, and more," plus building "a generic backfill system for cloning the existing tables while performing double-writes from the application, and a parallel double-read diffing system" — i.e. the dual-write/dual-read migration tax shows up even at PoC scope. (Source: this article.)
- Slack became one of the largest upstream Vitess contributors as a side effect of adoption. "To get Vitess working for Slack, they even became one of the largest contributors to the Vitess repo." This is the patterns/contribute-upstream-during-migration pattern — self-run adoption at scale forces you to fix the substrate, and the fixes become upstream contributions. PlanetScale's own positioning is the inverse: "we employ roughly 75% of the Vitess maintainers" — the vendor-maintainer concentration is itself part of the managed-vs-self-run calculus. (Source: this article.)
- Square Cash's post-migration operational issues are the shape of what self-run Vitess teams actually debug. Post-initial-migration, Cash hit "deadlocks that caused outages, handling scatter queries, keeping transactions ACID, resharding, and much more." Each of these is a distinct failure mode distilled into concepts/distributed-deadlock-on-sharded-mysql, concepts/scatter-query-from-legacy-code, and (already-existing) concepts/cross-shard-query. Critical engineering quote about cross-shard transactions in money-handling code: "Other users of Vitess (like YouTube) can make different trade offs — maybe dropping a comment every once in a while isn't the end of the world for them. But not us. So the first thing we had to do was change our application code so that it wouldn't do cross shard transactions in critical money-processing portions of the code." (Source: this article.)
- "Infra cost parity, but team-cost delta" is the canonical managed-vs-self-run claim. "With a PlanetScale Enterprise plan, the raw infrastructure costs are usually in line with what you'd pay running Vitess on your own. Apart from your cloud bill for the infrastructure, you'll pay PlanetScale for management and enterprise support. The PlanetScale costs scale with the resources and storage that you provision." The savings lever is people-time, not machine-time. This is the pricing-model framing captured at patterns/build-vs-buy-managed-database. (Source: this article.)
- Resharding arrives sooner than teams expect. "Teams sometimes end up facing the challenge of resharding much sooner than anticipated." The implication: the initial sharding scheme is almost always wrong, and the first reshard is the test of whether your operational runbook actually works. PlanetScale position this as a hands-on-support moment. (Source: this article.)
Operational numbers¶
| Dimension | Self-run Vitess | Source |
|---|---|---|
| DBA team size, 200GB–1TB workload | 3–6 DBAs steady state, ~10 for initial implementation | this article |
| DBA team size, 1TB+ workload | "at least 10 people at all times" | this article |
| Vitess major-version release cadence | 2–3 per year | this article |
| Typical version-lag at onboarding | 3+ major versions behind | this article |
| Slack's Vitess migration duration | ~3 years (Jul 2017 → late 2020) | Slack blog, cited |
| Slack's peak Vitess QPS | 2.3M | Slack blog, cited |
| PlanetScale share of Vitess maintainers | ~75% | this article |
Caveats¶
- Vendor-authored marketing comparison. The framing, feature emphasis, and numbers are PlanetScale's; they have a direct interest in making self-run look expensive and managed look inevitable. The operational numbers should be treated as PlanetScale's own field data from customer onboardings, not neutral industry benchmarks. Where possible the original Slack/Square sources (linked inline) are the canonical primary record and this page cites them as references.
- No architectural deep-dive on PlanetScale's own management layer. The post enumerates PlanetScale features (deploy requests, PlanetScale Global Network, PlanetScale Insights, SOC/PCI/HIPAA compliance, PlanetScale Connect) but does not add new internals to any of them — those are covered in the respective dedicated source pages. This page's unique contribution is the operational numbers and the managed-vs-self-run framing.
- "Hands-off"-ness is not quantified. PlanetScale claim "we even hold the pager for you, often detecting and mitigating any issues well before your team is even aware they existed" but the article does not give MTTR, alert-volume, or pager-burden numbers to underwrite this. Treat as marketing colour, not operational datum.
- Slack / Square numbers are ~5-year-old anecdata at time of writing (2024-05). Slack migrated 2017–2020; Cash started 2016. Vitess's compatibility gap has narrowed substantially since then (per the "The Vitess team is constantly working to close the gap" disclaimer in the article). Treat the team-sizing numbers as PlanetScale's 2024 estimate, and the company-specific timelines as historical anchors.
Source¶
- Original: https://planetscale.com/blog/self-run-vs-managed-vitess-with-planetscale
- Raw markdown:
raw/planetscale/2026-04-21-self-managed-vitess-vs-managed-vitess-with-planetscale-ec186234.md
Related¶
- systems/vitess — the underlying substrate this article frames as "expensive to run yourself"
- systems/planetscale — the managed vendor positioned as the alternative
- systems/mysql — what Vitess shards; the compatibility layer is what creates the migration tax
- companies/planetscale — company index with cross-article synthesis
- concepts/vitess-release-cadence — 2–3 major versions/year
- concepts/self-managed-vs-managed-vitess-cost — team-sizing anchor
- concepts/vitess-migration-timeline — Slack-3-years anchor
- concepts/mysql-compatibility-gap — the pre-migration refactor tax
- concepts/distributed-deadlock-on-sharded-mysql — Square Cash post-migration war story
- concepts/scatter-query-from-legacy-code — Square Cash post-migration war story
- patterns/build-vs-buy-managed-database — infra-cost-parity / people-cost-delta framing
- patterns/proof-of-concept-before-full-migration — Slack's PoC-first adoption shape
- patterns/contribute-upstream-during-migration — side-effect of self-run adoption at scale