CONCEPT Cited by 1 source
Shard-key as database_id¶
database_id as shard key is the canonical shape for a
telemetry / multi-tenant store where each customer database
is the unit of isolation and no workflow joins across
customer databases. The shard key is the customer's database
identifier; all telemetry rows keyed on that identifier live
on a single shard.
This is a special case of tenant-id sharding with two load-bearing properties:
- No cross-shard queries by design. The Insights UI always displays data for one customer database at a time; a query never spans customers. Queries are therefore always shard-local; no scatter-gather.
- Even distribution across shards. Hash-of-database-id spreads customers uniformly assuming a moderate customer population. Insights' many-thousands-of-customer fleet clears the required cardinality.
(Source: sources/2026-04-21-planetscale-storing-time-series-data-in-sharded-mysql-to-power-query-insights.)
Canonical PlanetScale framing¶
Rafer Hazen, 2023-08-10: "Database ID works well as a shard key because we never need to join data across customer databases, and it results in a fairly even distribution of data across shards."
The rule of thumb is:
- If the product's access pattern is always tenant-scoped, shard by tenant ID.
- If a small fraction of accesses span tenants (e.g. fleet-wide reporting), shard by tenant ID and serve the cross-tenant workflow from a separate pipeline (Prometheus, data warehouse).
- If many accesses span tenants, tenant-ID sharding is wrong — pick a different shard key or a different partition strategy.
Why this works for telemetry specifically¶
Telemetry is usually tenant-scoped because the reader is tenant-scoped: a user of PlanetScale Insights is asking questions about their database, not the fleet. The shape of the question matches the shape of the storage.
The fleet-wide view ("what is the 99th-percentile query latency across all PlanetScale customers?") lives in Prometheus (systems/prometheus) — the low-cardinality half of the hybrid telemetry store.
Hot-shard risk¶
Tenant-ID sharding has a hot-shard risk when one tenant is much larger than others. Insights' published framing says the hot-shard risk is absorbed by the combination of:
- A generous unique-pattern-per-interval cap at the VTGate instrumentation layer — the largest customers still don't regularly exceed it.
- Small-shards-wide-fleet posture (patterns/small-shards-wide-fleet) — if a shard runs hot, operators have the option of scaling that specific shard up (2 vCPU → 4 / 8 vCPU), or splitting to more shards via the dual-write-branch-cutover pattern.
Comparison to fingerprint-as-shard-key¶
An alternative shape would shard by (database_id, fingerprint)
— a two-level grouping. PlanetScale opted against this: they
use (database_id, fingerprint) as the Kafka key (for
partition affinity and in-memory coalescing), but the MySQL
shard key is just database_id. Rationale is implicit but
structural: per-fingerprint sharding would fragment a single
customer's data across shards, breaking the clean
"tenant-scoped query has no cross-shard join" property.
Seen in¶
- sources/2026-04-21-planetscale-storing-time-series-data-in-sharded-mysql-to-power-query-insights — canonical wiki disclosure of PlanetScale Insights' tenant-ID sharding posture. Canonical operational quote: "Database ID works well as a shard key because we never need to join data across customer databases, and it results in a fairly even distribution of data across shards."