PATTERN

Small shards, wide fleet¶

Problem. A sharded cluster can be built out of few large machines (each holding a large shard) or many small machines (each holding a small shard). The two postures produce very different operational profiles:

Few large shards: each shard is operationally expensive — backups take longer, schema changes take longer, restore takes longer. Vertical scale-up is limited by the largest cloud instance type.
Many small shards: each shard is operationally cheap. Backups / schema changes run in parallel across the fleet and finish in shard-size time. Vertical scale-up of each shard is unconstrained — small shards can grow 5–10× before hitting instance ceilings.

Solution. Prefer small per-shard hardware (2 vCPU / 2 GB on PlanetScale Insights) paired with a wider fleet. The wider fleet exploits:

Parallelism on maintenance (backup, schema change) — wall-clock latency scales with per-shard size, not total data size.
Two vertical escape hatches — if a shard runs hot, operators can scale it up to a larger instance OR split its data to more shards (via dual-write branch cutover or Vitess live reshard).
Smaller blast radius per shard loss.

(Source: .)

Canonical PlanetScale Insights framing¶

Rafer Hazen, 2023-08-10: "We've successfully run the Insights database cluster on fairly small machines (2 vCPUs and 2GB memory). A larger number of smaller shards keeps backups and schema changes fast, gives us the option of quickly scaling up to larger hardware if we encounter an unexpected throughput increase, and gives us breathing room to backfill a new cluster with more shards when necessary."

The three load-bearing properties:

Backups and schema changes finish in per-shard time. Every shard operates independently; parallelism is limited by the fleet's control-plane throughput, not any one shard's wall clock.
Scale-up headroom is intact. 2 vCPU → 16 vCPU is 3+ orders of magnitude of vertical capacity per shard. A fleet of already-maxed-out large shards has no vertical escape.
Backfill-to-more-shards is cheap. The 8-day dual-write window to a larger shard count is fast because each shard is small; the new cluster fills up in 8 days × steady-state ingest rate per shard.

Contrast with large-shard posture¶

Many deployments land on a few-large-shards posture inadvertently — they grow each shard vertically to avoid the operational pain of adding shards, until each shard is a large instance with large data. The resulting fleet has:

Backups measured in hours, not minutes.
Schema changes at per-shard-data-size wall clock.
Vertical scale-up limited by instance-type ceilings.
Reshard-to-more-shards expensive because each shard has so much data to split.

PlanetScale's Insights posture inverts the default: take the operational-cost penalty of more shards up front, collect the downstream operational benefits in every recurring maintenance cycle.

When this is wrong¶

Cross-shard queries are frequent. Wider fleet → more scatter-gather overhead, more network hops, more cross-shard-tx coordination. If the workload isn't shard-local, wide-fleet makes the query side worse faster than it makes the maintenance side better.
Per-shard overhead is not negligible. Some storage engines / databases have per-instance overhead (metadata cache, connection-fixed memory, fixed-cost WAL buffers) that doesn't shrink linearly with data-size. Running on tiny instances wastes a higher fraction of RAM on fixed-cost overhead.
Workload is read-heavy with small dataset. Three small shards serving a small dataset is strictly worse than one medium shard — less memory for the buffer pool, more round-trips for scatter-gather.

Insights escapes all three: data is shard-local by design (database_id shard key), the MySQL engine runs fine on 2 GB RAM for this workload, and the workload is write-heavy with linear per-shard write scaling.

Composition with `database_id` sharding¶

Small-shards-wide-fleet works especially well with tenant-ID sharding (concepts/shard-key-database-id) because the natural customer distribution is usually wide enough to saturate a wide fleet. A few customers-per-shard average yields even distribution.

Seen in¶

— canonical wiki disclosure. PlanetScale's 8-shard Insights cluster runs each shard on 2 vCPU / 2 GB machines; Hazen explicitly canonicalises the operational-benefit rationale (backup speed, schema change speed, scale-up headroom, backfill flexibility).