PATTERN

Hybrid time-series store — Prometheus + sharded MySQL¶

Problem. Telemetry data has two very different shapes: a low-cardinality database-level axis (a few metrics, one per database, queries-per-second aggregated across all query patterns) and a high-cardinality per-pattern axis (10s of millions of unique fingerprints per day). A single storage engine priced for one axis is mis-priced for the other:

Prometheus / most TSDBs are efficient at low cardinality but price-punishing at high cardinality — the series index, RAM footprint, and query cost all scale linearly in unique label combinations.
A sharded relational database can absorb arbitrary cardinality by using the high-cardinality axis as a primary-key component — but at the cost of the built-in aggregation / query-acceleration features a TSDB gives you for free at low cardinality.

Solution. Split the telemetry by cardinality profile and route each axis to the storage it fits:

Low-cardinality axis → Prometheus. Database-level metrics, fleet-level summary, operator dashboards of "one number per database." Prometheus' cardinality tolerance is generous for this scope.
High-cardinality axis → sharded relational DB. Per- fingerprint, per-actor, per-tag rows go to a horizontally sharded MySQL or Postgres cluster. Cardinality is bounded by the shard key's distribution rather than by Prometheus' series index.

(Source: .)

Canonical PlanetScale Insights application¶

Rafer Hazen, 2023-08-10: "The database-level data fits well into a time series database like Prometheus, but we run into several issues when trying to store the query- pattern level data in a time series database. On any given day, there are 10s of millions of unique query patterns being used across PlanetScale, and we anticipate that this number will continue growing. Most time series databases start having issues when dimension cardinality is that high. We evaluated the cost of storing this data in Prometheus and found that it would be expensive enough to be fairly unattractive."

And: "Given these requirements, we built a hybrid solution that uses Prometheus for database-level aggregations (where cardinality is suitably low) and a sharded PlanetScale database, backed by MySQL and Vitess, to store query- pattern level statistics and individual slow query events."

Structural properties¶

Workflow-scoped routing. An operator looking at fleet health hits the Prometheus side; an operator debugging a specific database's slow patterns hits the sharded-MySQL side. The product workflow already partitions by the right axis — the hybrid store just matches the split.
No cross-store joins. Because the two stores answer different questions, there's no need to merge them at query time.
Cost is aligned with cardinality-per-axis. The per- row cost is lowest on the storage engine whose indexing strategy matches the axis's cardinality shape.

When this pattern is the wrong answer¶

Single-axis telemetry. If everything is either low cardinality or high cardinality but not both, pick the one engine that fits.
Cross-axis joins are critical. If you routinely answer "how does the fingerprint-level p95 correlate with the database-level load?", the hybrid store forces an application-layer join — potentially worse than one engine that handles both axes (e.g. ClickHouse).
OLAP databases become viable. Hazen explicitly considers this: "A wide variety of OLAP databases could also serve our needs here, but all of them involve significant operational overhead and a steep learning curve." Note that PlanetScale eventually migrated the high-cardinality side to ClickHouse (: "Both feed ClickHouse") — this pattern canonicalises the earlier, sharded-MySQL era.

Contrast with single-engine ClickHouse / Druid / Pinot¶

OLAP columnar stores (ClickHouse, Apache Druid, Pinot) handle both low- and high-cardinality axes competently but introduce operational overhead (separate replication model, separate backup story, separate failure mode). The hybrid pattern avoids introducing a net-new storage engine by leveraging engines the operator already runs — Prometheus for fleet metrics, a sharded relational DB for multi-tenant storage — at the cost of duplicated telemetry pipelines.

The trade-off is operational simplicity vs unified query surface. PlanetScale's 2023 choice was simplicity — and was reversed once the team's OLAP operational experience crossed the viability threshold (ClickHouse migration).

Seen in¶

— canonical wiki disclosure. Insights' 2023-era store: Prometheus for database-level metrics, sharded Vitess- MySQL for per-pattern telemetry. Explicitly rejects single-engine OLAP stores on operational-overhead grounds.