Skip to content

CONCEPT Cited by 1 source

Shard-key as database_id

database_id as shard key is the canonical shape for a telemetry / multi-tenant store where each customer database is the unit of isolation and no workflow joins across customer databases. The shard key is the customer's database identifier; all telemetry rows keyed on that identifier live on a single shard.

This is a special case of tenant-id sharding with two load-bearing properties:

  1. No cross-shard queries by design. The Insights UI always displays data for one customer database at a time; a query never spans customers. Queries are therefore always shard-local; no scatter-gather.
  2. Even distribution across shards. Hash-of-database-id spreads customers uniformly assuming a moderate customer population. Insights' many-thousands-of-customer fleet clears the required cardinality.

(Source: sources/2026-04-21-planetscale-storing-time-series-data-in-sharded-mysql-to-power-query-insights.)

Canonical PlanetScale framing

Rafer Hazen, 2023-08-10: "Database ID works well as a shard key because we never need to join data across customer databases, and it results in a fairly even distribution of data across shards."

The rule of thumb is:

  • If the product's access pattern is always tenant-scoped, shard by tenant ID.
  • If a small fraction of accesses span tenants (e.g. fleet-wide reporting), shard by tenant ID and serve the cross-tenant workflow from a separate pipeline (Prometheus, data warehouse).
  • If many accesses span tenants, tenant-ID sharding is wrong — pick a different shard key or a different partition strategy.

Why this works for telemetry specifically

Telemetry is usually tenant-scoped because the reader is tenant-scoped: a user of PlanetScale Insights is asking questions about their database, not the fleet. The shape of the question matches the shape of the storage.

The fleet-wide view ("what is the 99th-percentile query latency across all PlanetScale customers?") lives in Prometheus (systems/prometheus) — the low-cardinality half of the hybrid telemetry store.

Hot-shard risk

Tenant-ID sharding has a hot-shard risk when one tenant is much larger than others. Insights' published framing says the hot-shard risk is absorbed by the combination of:

  • A generous unique-pattern-per-interval cap at the VTGate instrumentation layer — the largest customers still don't regularly exceed it.
  • Small-shards-wide-fleet posture (patterns/small-shards-wide-fleet) — if a shard runs hot, operators have the option of scaling that specific shard up (2 vCPU → 4 / 8 vCPU), or splitting to more shards via the dual-write-branch-cutover pattern.

Comparison to fingerprint-as-shard-key

An alternative shape would shard by (database_id, fingerprint) — a two-level grouping. PlanetScale opted against this: they use (database_id, fingerprint) as the Kafka key (for partition affinity and in-memory coalescing), but the MySQL shard key is just database_id. Rationale is implicit but structural: per-fingerprint sharding would fragment a single customer's data across shards, breaking the clean "tenant-scoped query has no cross-shard join" property.

Seen in

Last updated · 470 distilled / 1,213 read