Skip to content

CONCEPT Cited by 1 source

Query-statistics telemetry

Query-statistics telemetry is the practice of tracking per-query execution statistics — count, rows returned, duration — at the query-path proxy layer, as opposed to the database engine itself. This gives the hosted database provider a complete view of query traffic without burdening the MySQL/Postgres server with performance-schema work, and without requiring clients to opt in to third-party APM.

The PlanetScale framing

David Graham's 2021 feature-launch post frames it precisely:

Starting today, all PlanetScale database branches now track statistics about each SQL query that has executed against it without any overhead. By using Vitess, which powers PlanetScale databases, to track how many times a query runs, how many rows it returned, and how long each query takes to complete, we provide a complete view of query traffic on the database.

sources/2021-08-31-planetscale-optimizing-sql-with-query-statistics

The three data points collected per query pattern:

Dimension What it tells you
Execution count Frequency — is this query hot?
Rows returned Volume — how much work was done?
Execution duration Latency — is this query fast?

Aggregated by normalised query pattern, this yields the building blocks for everything PlanetScale Insights later surfaces: slow-query lists, digest-level latency dashboards, index suggestions, error tracking, query-shape change detection.

Why the proxy layer is the right place

The post names the alternatives and their failure modes:

When you experience database issues, capturing queries on a running system is a hard problem. Existing solutions, such as tcpdump, SHOW PROCESSLIST; or third-party monitoring agents, can cause additional system load or gather only a small subset of queries.

sources/2021-08-31-planetscale-optimizing-sql-with-query-statistics

Alternative Why it falls short
tcpdump Captures every packet but needs host access, imposes packet-processing overhead, produces raw data that needs separate aggregation.
SHOW PROCESSLIST Instantaneous snapshot — misses any query that finishes between polls. Point-in-time, not rate-based.
Third-party APM agents Either add CPU load inside the server process or sample at the client (incomplete coverage). Often paid add-ons.
Performance Schema Enabled in the server: requires MySQL restarts to tune, adds measurable overhead for high-cardinality query shapes.

A proxy like Vitess sits naturally on the query path, already parsing SQL to make routing decisions. Adding per-query aggregation there is zero overhead to the database server — the cost is borne by the proxy layer that was already inspecting the query.

Overhead profile

"Without any overhead" is a specific claim with three components:

  1. Zero load on the MySQL server — the database process does not need performance_schema statement tables enabled for PlanetScale's query-stats feature to work.
  2. Zero sampling — every query is counted, not sampled. Unlike tcpdump-style capture, this is aggregated on the fly into per-pattern counters; the storage cost is per distinct query shape, not per query execution.
  3. Zero client-side instrumentation required — no SDK, no driver-wrapper, no JDBC shim. The client ships whatever SQL it ships and the proxy sees it.

(The proxy itself does pay a cost — query parsing, AST normalisation for pattern fingerprinting, counter updates, async publication to downstream storage — but this cost is amortised across the sharded proxy fleet and hidden from the customer's database footprint.)

2021 → 2026 evolution

This post captures the genesis of the feature. The 2021 version:

  • Per-query count, rows returned, duration.
  • A flagged slow-query list using a [[concepts/100ms-slow-query-threshold|100 ms threshold]].
  • Accessed via a web UI on the PlanetScale dashboard, referenced in the post as "the query statistics report".

By the 2024–2026 era the feature had evolved into PlanetScale Insights:

  • Query-pattern fingerprinting via query digest hashes (AST normalisation of literal values).
  • Per-pattern time-series in sharded MySQL with per-minute and per-hour rollups.
  • DDSketch-based percentile estimation for error-bounded latency histograms.
  • SQLCommenter tagging for actor attribution.
  • AI-powered index suggestions that validate candidates with HypoPG hypothetical-index EXPLAIN plans.
  • Traffic Control — the same query-stats infrastructure used for enforcement, not just observation.

The 2021 post is the foundational primitive on top of which every later evolution is built.

Use case: the feedback loop for optimisation

Query-statistics telemetry is what makes the query → diagnose → fix → verify loop possible at scale. The canonical example from the same post:

  1. Query-stats surfaces a slow pattern: SELECT * FROM backup WHERE deleted_at IS NULL AND expires_at <= CURRENT_TIMESTAMP ORDER BY id LIMIT 1000 — 719 ms average.
  2. EXPLAIN shows type: ALL (full table scan). The composite index (expires_at, data_deleted_at, deleted_at) was rejected.
  3. Redesign the index column order: (deleted_at, expires_at, data_deleted_at) (see composite-index column order).
  4. EXPLAIN the fix → type: range, rows: 1.
  5. Deploy the new index. Query-stats confirms the expected runtime — now under 20 ms.

Without query-statistics telemetry, step 1 doesn't happen — the problem stays latent until it grows into an incident. This is the argument for making query-stats always-on, zero-overhead, and surfaced in-product — the feature's whole value is the feedback loop it enables.

Seen in

Last updated · 550 distilled / 1,221 read