CONCEPT Cited by 3 sources
Horizontal sharding¶
Horizontal sharding splits a single logical table (or group of related tables) so that its rows live across multiple physical database instances. Each row is assigned to a shard by a function of its shard key (commonly hash(shard_key) → shard_id). Orthogonal to vertical partitioning (which splits different tables across databases, keeping each whole table intact on one instance).
Why take this on¶
- Vertical partitioning ceiling: once an individual table exceeds the host DB's practical limits (TB-scale heap / billions of rows / vacuum stalls / IOPS ceilings), vertical partitioning can't help — the smallest unit of vertical partitioning is a single table (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).
- Transparent future scale-outs: once a table is horizontally sharded at the application layer, it supports any number of physical shards; future scale-outs are a shard-split operation with no application-level change.
What you give up vs a single-node relational DB¶
- SQL feature set shrinks. Foreign keys, globally unique indexes, and certain join / aggregation / nested-SQL patterns become inefficient or infeasible to enforce across shards. Unique indexes often only survive if they include the shard key.
- Schema changes must coordinate across all shards.
- Transactions can span multiple shards → partial commit failures become a product-visible failure mode (the canonical illustration: "moving a team between two orgs, only to find half their data was missing" — Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).
- Atomic cross-shard transactions require either a distributed-transaction protocol or product-layer compensation logic; many implementations (like Figma) deliberately skip them and make the product resilient to cross-shard failures.
- Queries without a shard-key filter become scatter-gather — fan out to every shard, aggregate back. Expensive to implement correctly and a scale cap (each SG touches every shard → same load as unsharded).
Two axes to design along¶
- Shard-key selection (concepts/shard-key). One key across all tables vs a small set of keys per-domain (Figma's choice:
UserID,FileID,OrgID— each covers almost every table; tables sharing a key are grouped into colos). One-key-fits-all often forces a composite key + backfill + app refactor; per-domain keys require a colocation scheme but minimize churn. - Logical vs physical split (concepts/logical-vs-physical-sharding). The "serves as if sharded" state can be decoupled from the "data physically across multiple hosts" state so the riskier physical failover happens only after the application layer is trusted end-to-end.
Build vs buy¶
Existing horizontally-sharded SQL/NoSQL options: CockroachDB, TiDB, Spanner, Vitess (and the NoSQL family). Adopting any of these typically means a full cross-store data migration + rebuilding operational expertise on the new substrate. Figma's 2022 decision to build in-house on top of vertically-partitioned RDS Postgres was driven by aggressive runway pressure + deep operational expertise + a relational data model NoSQL couldn't express (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale). The explicit corollary: once initial runway is bought, the in-house choice is worth re-evaluating against mature NewSQL / managed options — the choice was shaped by the deadline, not by long-term preference.
Distinction from auto-sharding / shard management¶
Horizontal sharding at the SQL database level (this page) is about splitting a table's rows across DB instances. "Dynamic sharding" at the shard-manager level (e.g. systems/dicer) is about assigning stateful service pods to key ranges with a controller. Complementary axes: horizontal sharding answers where does the row go, dynamic sharding answers which pod owns the range now.
Distinction from MySQL partitioning¶
Horizontal sharding is cross-server; MySQL partitioning is same-server. MySQL partitioning splits a table into multiple on-disk partitions on the same machine — a vertical-scaling lever. Horizontal sharding splits a table's rows across multiple separate machines — a horizontal-scaling lever. Both answer "which subset of rows goes where" via a function of the key (range / list / hash), but the physical unit differs. See concepts/partitioning-vs-sharding for the full one-page disambiguation: "partitioning occurs on the same server and is supported by MySQL natively, whereas sharding a database splits tables across different servers and requires external mechanisms to achieve this" (). PlanetScale / Vitess deliberately don't support MySQL partitioning because sharding subsumes its benefits and eliminates its single-server-fault-domain drawback.
Seen in¶
-
— Brian Morrison II (PlanetScale, 2023-11-20) canonicalises a four-axis framing of sharding's benefits: (1) throughput (the obvious one), (2) sharded failure-domain isolation — "in sharded environments, this failure domain is actually spread out" — a single-shard outage caps blast radius to ~1/N of customers, (3) operational ease (backups + ghost-table schema migrations run embarrassingly parallel across shards), (4) linear cost scaling via commodity storage tiers. The four benefits are additive, not overlapping — an 8-shard deployment simultaneously achieves 1/8 blast radius, 8× parallel backup throughput, 8× parallel ghost-table migration throughput, and 8× cheap-volume cost instead of 1× premium-volume cost. Tier-3 pedagogy voice; no production numbers; the load-bearing quantification for each benefit is on sibling canonical sources (Dicken 2024-07-30 backups, Dicken 2024-08-19 IOPS). This post is the first canonical wiki framing of sharded-failure-domain- isolation as a first-class benefit alongside the already- canonical backup-parallelism and IOPS-cost arguments.
-
— canonical wiki disclosure that Temporal's own architecture uses horizontal partitioning internally (workflows are sharded across the History subsystem; Shilkov 2021 is the canonical sizing reference). Longoria 2022-07-22 frames the horizontal-partitioning alignment as the reason PlanetScale / Vitess composes cleanly with Temporal as a persistence substrate: "Since Temporal also uses horizontal partitioning, Temporal maps effortlessly onto PlanetScale and can take full advantage of PlanetScale's scalability improvements over a single MySQL instance." Shard counts on the orchestrator side (Temporal History) and the storage side (Vitess) scale independently — a general architectural disclosure that applies beyond the Temporal/PlanetScale pairing to any sharded-orchestrator- on-sharded-store topology.
-
— canonical wiki disclosure of horizontal sharding's latency-bound regime. Savannah Longoria (PlanetScale, 2022-12-14) canonicalises Temporal's per-shard serialised-update discipline — "Temporal serializes all updates belonging to the same shard, so all updates are sequential. As a result, the latency of a database operation limits the maximum theoretical throughput of a single shard." First wiki datum that horizontal sharding is sometimes latency-bound rather than bandwidth-bound: per-shard throughput = 1 / persistence_latency, so you cannot upsize a shard out of saturation, only add more shards (and Temporal's shard count is immutable after cluster deploy). Companion worked example of a two-keyspace sharded-plus-unsharded split under a write-intensive workload. Empirical anchor: one customer's Temporal-on-PlanetScale setup sustained 40k–200k QPS through Black Friday / Cyber Monday 2022 with peaks to 180k, "no interruptions".
-
— Hazen, 2023-08-10, canonicalises
database_idas a natural tenant-scoped shard key with two load-bearing properties: "we never need to join data across customer databases" (no cross-shard queries) and "a fairly even distribution of data across shards." Paired with small-shards-wide- fleet (2 vCPU / 2 GB per shard) and 4 → 8-shard dual-write reshard via new cluster. PlanetScale's own Insights store dogfooded on this architecture. Canonical wiki datum: tenant-ID sharding ondatabase_idis the right default for multi-tenant telemetry when all workflows are tenant-scoped. -
— canonical wiki datum that unsharded-to-sharded migration is a dashboard-click operation when the substrate (Vitess) and the data-motion primitive (
MoveTables) support it. Ben Dicken's 2024-11-07 post canonicalises concepts/unsharded-to-sharded-migration as a distinct migration variant (vs external-source import and cross-cluster migration) and frames it as "shifts this phase of a company's existence from an extreme pain point to a smooth, well-tuned process" — i.e. the one-way-door scaling milestone becomes a revolving-door operation with reverse- replication rollback. The data-motion mechanics are genuinely compressed to a few clicks via systems/planetscale-workflows; the upstream shard-key design and cross-shard query audit work still requires the same engineering judgement the build-your-own-sharding path did (caveat preserved on source page). -
sources/2026-04-21-planetscale-increase-iops-and-throughput-with-sharding — Ben Dicken canonicalises horizontal sharding as an I/O-cost lever, alongside the classical data-size / write-throughput / read-throughput motivations. Each shard sees 1/N the aggregate IOPS + throughput demand, which keeps the per-shard demand below the cheap-volume-tier ceiling (AWS
gp3default 3,000 IOPS / 125 MiB/s). This avoids the super-linear cost multiplier of upgrading to provisioned-IOPS volumes (io1/io2). Worked example: an 8× workload that costs $20,520-$24,197/mo on unsharded RDS withio1costs $13,992/mo on an 8-shard PlanetScale cluster — a linear-in-shard-count multiplier vs the unsharded regime's 11-13× cost cliff. Canonical new patterns/sharding-as-iops-scaling pattern + companion to the commoditisation-of-sharding argument Berquist's 2023 post canonicalises ("sharding is no longer a last resort"). The two Dicken posts together bracket the two architectural answers to the IOPS-cost-cliff: shard on cheap volumes (2024-08-19) vs move to local NVMe (2025-03-13). PlanetScale's Metal tier (March 2025) is the structural implementation of the latter; this post pre-dates Metal and the post-publication Note flags Metal as the alternative. -
sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — Figma's in-house horizontal sharding on RDS Postgres: vertical partitioning → horizontal sharding path, colos, hash-of-shard-key routing, logical-vs-physical decoupling via views, DBProxy query engine, shadow-readiness query-subset selection. First sharded table September 2023 with 10s partial primary availability / no replica impact.
- sources/2026-04-21-planetscale-dealing-with-large-tables — Ben Dicken (PlanetScale, 2024-07-10) canonicalises horizontal sharding as the top rung of a three-rung scaling ladder (vertical scaling → vertical sharding → horizontal sharding) and walks the raw
vtctldclient Reshard --source-shards '0' --target-shards '-40,40-80,80-c0,c0-' create→switchtrafficsequence with a hypotheticalexercise_logworked example. Canonical wiki instance of thelog_idvsuser_idshard-key teaching example: hashing bylog_iddistributes evenly but makes per-user reads "terrible for performance" because the user's logs scatter across all shards; hashing byuser_idroutes every per-user query to a single shard. Also canonicalises the four structural benefits of horizontal sharding (write throughput, backup speed, failure isolation, cost) and the resharding-is-not-one-way-door property viaReshard-online-via-VReplication. Canonical new hot-shard-write-frontier concept + shard-key-aligned-with-query-pattern pattern + reshard-online-via-vreplication pattern. - — Berquist's decision-framework post canonicalises the three sharding-trigger signals (data size, write throughput, read throughput) and the commoditisation-of-sharding argument ("sharding is no longer a last resort" since Vitess became open-source). Horizontal sharding framed as the top rung of the scaling ladder — reached after vertical scaling, read-replicas, and vertical sharding are exhausted. Named webscaler precedents: TAO (Facebook), Gizzard (Twitter), Vitess (YouTube).
- — Lucy Burns + Taylor Barnett (PlanetScale, 2023-08-31, originally 2020-10-22) canonicalise the pre-Vitess-era historical framing and the cost-curve economic argument for horizontal sharding. Names Pinterest (2012, Marty Weiner: "We had several NoSQL technologies, all of which eventually broke catastrophically") and Etsy (two-way lookup + shard-packing) as the canonical application-level sharding precedents, and the three compounding failure modes (routing logic per-feature / cross-shard features reimplemented in app / resharding-as-daunting-operational-challenge) that drove the shift to a substrate layer. Canonicalises concepts/vertical-scaling-cost-step-function — the "next you invest in could easily be 50% larger, while you really only end up using 10% of it" overprovisioning-discontinuity — as an economic complement to the operational argument for horizontal sharding. Adds three novel production-user datapoints: JD.com's 35 million QPS on Singles Day, Slack's full Vitess migration surviving the WFH 2020 transition, and Square's Cash app on Vitess. Names 2010 as the Vitess origin year at YouTube.
- — Jonah Berquist (PlanetScale, 2022-09-01) provides the empirical demonstration of linear horizontal-sharding throughput scaling via a
sysbench-tpccbenchmark across a shard-count scan: 16 shards = 420k QPS, 32 shards = 840k QPS, 40 shards = 1M+ QPS sustained over 5 minutes. The 16 → 32 step is a clean 2× — canonical wiki evidence that doubling shards approximately doubles the QPS ceiling on shard-key-aligned workloads. Paired new concept: concepts/linear-shard-count-throughput-scaling. Also canonicalises the per-configuration saturation signal concepts/latency-rises-before-throughput-ceiling — VTGate p99 spikes before QPS plateaus, and QPS-per-thread derivative goes negative (1024 → 2048 step gained more QPS than 2048 → 4096) before the absolute ceiling is hit. Canonical disambiguation: Vitess shard counts are not restricted to powers of 2 — the 40-shard run explicitly demonstrates arbitrary-shard-count support, sized to hit 1M QPS via32 * (1M / 840k) ≈ 38, rounded up to 40. Caveats: single-tenant enterprise deployment with non-default timeout tuning — capability demonstration, not shared-tenant SLO. -
sources/2024-08-26-slack-unified-grid-how-we-re-architected-slack-for-our-largest-customers — Slack's Unified Grid re-architecture canonicalises shard-key re-axis as architectural-migration precondition. Slack's original 2013 design sharded along workspace ID; session tokens carried the workspace ID to route queries. The pre-Unified-Grid Vitess migration re-sharded hot tables — notably
messagesby channel ID — off the workspace axis, so query routing no longer needed the workspace from the session token. This made the later workspace→org-wide session-scope re-axis tractable: APIs whose tables had already been re-sharded could simply drop the workspace-routing context; APIs on still-workspace-sharded tables needed the three-strategy fallback (prompt for workspace / iterate workspaces bounded at 50 / fallback ordering). Canonical framing on the source page: "data is decoupled from the old axis before the architecture re-axes." -
— Justin Gage (guest post, 2023-04-06) canonicalises sharding-vs-replication orthogonality verbatim ("Sharding splits data across multiple servers so each server holds only a subset of the total data, reducing write load and storage pressure. Replication copies the same data to multiple servers, improving read performance and redundancy. Many production systems use both") — a framing this wiki had only implicitly; Gage's explicit orthogonality-of-two-axes treatment is the load-bearing contribution. Also names the umbrella term partition strategy for the hash/range/directory choice and canonicalises colocation as a design requirement not an optimisation (Amazon
orders+productswalk-through: "ideally all of the data you need to answer a particular query exists on the same physical machine"). Introduces the "shard-native database" vocabulary as the architectural alternative to hand-rolled horizontal sharding (names systems/cloud-spanner + systems/cockroachdb + PlanetScale as the category); first wiki framing of the "why are you not using a database that does sharding for you?" decision frame. Tier-3 pedagogy voice; no empirical numbers, but the canonical-vocabulary contribution is non-trivial.
Related¶
- concepts/vertical-partitioning
- concepts/shard-key
- concepts/scatter-gather-query
- concepts/logical-vs-physical-sharding
- concepts/static-sharding / concepts/dynamic-sharding (shard-management axis)
- patterns/sharded-views-over-unsharded-db
- patterns/colocation-sharding
- patterns/shadow-application-readiness
- systems/dbproxy-figma