PlanetScale — Query performance analysis with Insights¶
Summary¶
Rafer Hazen (PlanetScale, 2023-04-20) announces per-query-pattern
time-series metrics in PlanetScale
Insights and canonicalises the AST-walk literal-replacement
fingerprinting mechanism that groups raw SQL executions into
aggregatable patterns. The post is the earliest wiki citation of
Insights's query-pattern-as-unit-of-analysis design choice,
its use of Vitess's SQL parser as the
normalisation substrate, and its reliance on a sketch-based
error-bounded percentile estimator (citing
arXiv 1908.10693 / DDSketch)
so per-pattern p50 / p95 can be reported at high cardinality
without storing every execution. A worked production example
— a DELETE … WHERE <cond> LIMIT 500 hourly job whose p95
latency spirals from a few hundred milliseconds in minutes 10–20
past the hour to ~15 minutes in minutes 20–30 as the
matching-row pool shrinks and scans approach full-table — is
fixed with a single index on the minute column; the post
also canonicalises the deploy-marker overlay (the #505
annotation on the latency graph) as the UI primitive tying
schema-change deploys to the time-series they affect.
Key takeaways¶
-
Query pattern is the fundamental unit of Insights analysis. "The fundamental unit of analysis in Insights is a query pattern. … PlanetScale databases often receive thousands or even millions of queries per second, reporting performance stats for every individual query isn't usually what we want. Instead, we'd like aggregate data for similar queries over time. It's better to know 'This is how long id-based user lookups took over the last hour' versus 'This is how long it took to look up user 123.'" Canonical framing for query digests extended to Insights as the PlanetScale-specific UX. (Source: sources/2026-04-21-planetscale-query-performance-analysis-with-insights.)
-
AST-walk literal-replacement produces the fingerprint. "To establish what queries should be grouped together, we use Vitess's SQL parser to find a normalized pattern for every query. The Vitess query serving layer converts SQL into an abstract syntax tree, which we then walk to replace literals with generic placeholders."
select * from users where id = 123normalises toselect * from users where id = ?. A hash of the normalised SQL is the fingerprint — the grouping key for all per-pattern aggregates. Canonicalised as concepts/query-pattern-fingerprint + the pattern patterns/ast-normalized-query-fingerprinting. -
AST normalisation eliminates more than just literals. "Beyond literal extraction, Vitess's AST normalization also helps eliminate surface-level syntactic differences, such as casing differences or the presence of redundant parentheses." Two queries that differ only in case or redundant parens normalise to the same fingerprint — an advantage over naive regex-based literal stripping.
-
Per-pattern aggregates: count, total time, rows read/returned/written. "We can calculate a fingerprint (a hash of the normalized SQL) and use it to group queries and emit aggregated telemetry, such as the number of queries, total execution time, and total rows read, returned, and written." Canonical wiki disclosure of the specific aggregate axes Insights collects per pattern per interval — the substrate the later AI-powered index suggestions post uses as its input oracle (rows-read vs rows-returned ratio, runtime share, frequency).
-
Error-bounded percentiles via sketches — not sampled histograms. "We also send along a sketch of query execution times that allows us to show error-bounded percentiles (e.g., median and p95)." Citation: arXiv 1908.10693 (DDSketch or equivalent relative-error quantile sketch). Earliest wiki citation of PlanetScale's use of a streaming quantile sketch for per-pattern percentile reporting — the structural answer to "how do you give every pattern a p95 without storing every execution?" Canonicalised as concepts/ddsketch-error-bounded-percentile.
-
Delete-with-LIMIT asymptotes to full-table-scan as matches deplete. The worked example: "The query in question is a
deletewith awhereclause and alimit… The limit is set to 500. When the hourly run starts, there are many rows that meet the conditions in thewhereclause, so the first 500 matches are found relatively quickly. As the job progresses, and more and more of the matching rows have already been deleted, it becomes more expensive to find deletable rows. Toward the end of an hourly run, the query approaches needing to do a full table scan for each execution. Since there are over 100 million rows in the table, this operation becomes untenably expensive." Minutes 10–20 past hour: "a few hundred milliseconds". Minutes 20–30 past hour: "almost 15 minutes". Canonical wiki disclosure of this specific antipattern and its signature shape on a time-series graph. Canonicalised as concepts/delete-limit-asymptote. -
Fix: add an index on the predicate column. "Add an index, of course! Adding an index to the
minutecolumn lets the database quickly identify and delete the rows that match thewhereclause. Since we're using PlanetScale, it's easy to add an index without a performance hit or downtime while the index is building, even on busy tables with hundreds of millions of rows." Post-fix latency "consistently under a few hundred milliseconds" — "so dramatically that they're not even visibly discernible from zero on the graph." Ties to concepts/secondary-index and the deploy-request workflow. -
Deploy markers overlay schema events on latency graphs. The annotation on the latency graph labelled
#505— "our 505th deploy request to this database" — canonicalises the deploy- marker-on-time-series UX primitive: an operator-visible overlay tying a specific schema change to the post-deploy latency shape. Canonicalised as concepts/deploy-marker-annotation + patterns/deploy-marker-overlay-on-time-series. Sister to release-marker overlays in Datadog / New Relic / Honeycomb; PlanetScale-specific contribution is tying the marker to a deploy request number rather than a free-form tag.
Systems / concepts / patterns extracted¶
Systems — systems/planetscale-insights (per-query-pattern time-series UX), systems/vitess (SQL parser used as the AST- normalisation substrate), systems/mysql (underlying query engine producing the raw SQL executions), systems/planetscale (product).
Concepts (new) — concepts/query-pattern-fingerprint, concepts/ast-literal-normalization, concepts/ddsketch-error-bounded-percentile, concepts/per-pattern-time-series, concepts/deploy-marker-annotation, concepts/delete-limit-asymptote.
Concepts (extended) — concepts/query-digest (Vitess AST-
normalisation is the PlanetScale-specific implementation of the
digest concept previously canonicalised at the MySQL
performance_schema altitude), concepts/secondary-index
(production worked example of index-fix-from-tail-latency),
concepts/observability (per-pattern time-series as
observability primitive), concepts/deploy-request
(deploy-request number surfaces on the latency graph as a marker).
Patterns (new) — patterns/ast-normalized-query-fingerprinting (walk the AST, replace literals with placeholders, hash; subsumes regex-based digest normalisation), patterns/deploy-marker-overlay-on-time-series (overlay schema-change deploy events on the per-pattern latency graph), patterns/index-from-tail-latency-pattern (recognise full-table-scan asymptote in the p95 time-series shape, propose an index on the predicate column).
Patterns (extended) — patterns/digest-based-query-prioritization
(Insights's UI surfaces sort-by-cumulative-time per pattern —
the productised form of the ORDER BY SUM_TIMER_WAIT DESC
discipline from Dicken's MySQL-native workflow).
Operational numbers¶
- Table size: "over 100 million rows" on the DELETE-target table.
- LIMIT value: 500 rows per DELETE execution.
- Kickoff cadence: hourly, 10 past the hour.
- Pre-fix latency profile:
- Minutes 10–20 past hour: "a few hundred milliseconds" mean/p95.
- Minutes 20–30 past hour: "almost 15 minutes" — ~3,000× slowdown within the same hourly run.
- Headline row: "approximately 8 seconds to run on average" (main Insights query-stats row, averaged across the hour).
- Post-fix latency: "consistently under a few hundred milliseconds"; "not even visibly discernible from zero on the graph".
- Deploy-request number:
#505(the 505th deploy request to this database) — first-party signal that PlanetScale's production database is itself a heavy dogfooder of the deploy-request workflow. - Claimed baseline throughput: Insights is designed for databases receiving "thousands or even millions of queries per second" — back-references the 1M-connections substrate that makes this volume plausible.
Caveats¶
- Short post (~900 words), more product-announcement than
architecture deep-dive on the pipeline altitude. The
fingerprint-generation mechanism is sketched at one level of
abstraction ("walk the AST, replace literals"); collision-
resistance of the fingerprint hash, normalisation coverage
(does it normalise
IN (?, ?, ?)vsIN (?, ?, ?, ?, ?)to the same pattern? the post doesn't say), and per-pattern cardinality bounds are all elided. The later enhanced-tagging post (2026-03-24) canonicalises the aggregate-stream Kafka+ClickHouse architecture at higher altitude; this 2023-04-20 post is the earliest wiki citation of the fingerprinting substrate beneath that pipeline. - Sketch algorithm referenced by arXiv paper number (1908.10693) without being named in-post. Reader must follow the citation to identify the specific algorithm (DDSketch- family relative-error quantile sketch). No per-pattern sketch memory footprint, merge cost, or relative-error bound disclosed.
- Worked example is anecdotal, not quantified end-to-end.
No graphs with y-axis values; "almost 15 minutes" /
"a few hundred milliseconds" are the full quantitative
budget. No disclosure of how many similar
DELETE … LIMIT 500shapes PlanetScale's production database has, or what fraction of its queries qualify for the antipattern. - Index-add is presented as frictionless ("it's easy to add an index without a performance hit or downtime while the index is building, even on busy tables with hundreds of millions of rows") — that claim is the canonicalised gh-ost-style online schema change substrate covered at depth elsewhere on the wiki; this post treats it as given.
- Product-announcement framing: closing line "To see query pattern metrics in your database right now, click on a query from the table in your databases' Insights tab. Try it out, and let us know what you think!" Re-fetched from 2023-04-20 via RSS 2026-04-21 — three years of Insights evolution since, notably the Kafka+ClickHouse dual-stream pipeline + tagging system + anomaly detection + LLM-index-suggestions, all canonicalised on the Insights system page.
- Rafer Hazen byline — Hazen is PlanetScale's canonical Insights engineer and default-include voice on Insights- architecture posts (prior wiki ingests: 2022-09-27 debugging-database-errors, 2023-11-28 Insights Anomalies, AI-powered index suggestions, enhanced tagging). This post is his earliest wiki- ingested Insights post, providing the fingerprinting- substrate layer beneath all subsequent Insights ingests.
Source¶
- Original: https://planetscale.com/blog/query-performance-analysis-with-insights
- Raw markdown:
raw/planetscale/2026-04-21-query-performance-analysis-with-insights-63740f1d.md
Related¶
- systems/planetscale-insights
- systems/vitess
- systems/mysql
- systems/planetscale
- concepts/query-digest
- concepts/query-pattern-fingerprint
- concepts/ast-literal-normalization
- concepts/ddsketch-error-bounded-percentile
- concepts/per-pattern-time-series
- concepts/deploy-marker-annotation
- concepts/delete-limit-asymptote
- concepts/secondary-index
- concepts/observability
- concepts/deploy-request
- patterns/ast-normalized-query-fingerprinting
- patterns/deploy-marker-overlay-on-time-series
- patterns/index-from-tail-latency-pattern
- patterns/digest-based-query-prioritization
- companies/planetscale