PATTERN Cited by 1 source
AST-normalized query fingerprinting¶
Problem. An observability system needs to aggregate
metrics across a stream of SQL executions where every
execution has different literal values (id = 1, id = 2,
id = 3), different casing, different whitespace. Per-
execution metrics don't scale; regex-based literal stripping
misses too many equivalences.
Solution. Produce the grouping key by parsing the SQL to an AST, walking the tree to replace literals with placeholders and normalise surface-syntax differences, then hashing the canonical rendering. Two executions share a fingerprint iff their ASTs are equivalent after this normalisation pass.
(Source: sources/2026-04-21-planetscale-query-performance-analysis-with-insights.)
Mechanism¶
- Own a SQL parser — or borrow one. PlanetScale reuses Vitess's query-serving-layer parser so the observability substrate and the query-execution substrate agree on parse semantics.
- Walk the AST, replacing literal nodes with a
generic placeholder (
?). Typical literal types: numeric, string, boolean, date,NULL, array literals. - Normalise surface-syntax during the walk:
- Keyword case (
SELECT⇌select). - Identifier case (dialect-specific).
- Redundant parentheses.
- Whitespace collapsing.
- Trailing semicolons.
- Serialise the normalised AST back to SQL.
- Hash the canonical string — the hash is the query pattern fingerprint (concepts/query-pattern-fingerprint).
Why not regex?¶
Regex-based literal stripping (the common shortcut) breaks on quoting, parens, nested expressions, and keyword casing. The AST-walk approach round-trips through the grammar so all of these collapse by construction rather than by hand- rolled special cases.
PlanetScale's post calls this out explicitly: "Beyond literal extraction, Vitess's AST normalization also helps eliminate surface-level syntactic differences, such as casing differences or the presence of redundant parentheses."
Structural consequence: the fingerprint is the¶
aggregation axis
Once fingerprinting is stable:
- Counts per fingerprint per interval — how often does this pattern run?
- Cumulative time per fingerprint — where is the workload spending time? (patterns/digest-based-query-prioritization).
- Latency sketch per fingerprint — p95 per pattern (concepts/ddsketch-error-bounded-percentile).
- Rows-read / rows-returned ratio per fingerprint — index-opportunity signal.
All of these become cheap once the fingerprint collapses the high-cardinality query stream to a low-cardinality pattern stream.
Sharing the parser is a load-bearing choice¶
PlanetScale's substrate — Vitess's query planner — is already a SQL parser in the request path. Reusing it for telemetry normalisation means:
- The observability substrate and the execution substrate agree on SQL grammar by construction.
- Bug-for-bug compatibility — new SQL shapes supported by the planner are supported by fingerprinting too.
- No duplicate parser to keep in sync.
Systems without a first-party parser on the request path have to pick between a shared-library parser (risk of grammar drift) or building one just for observability (duplicate maintenance burden).
Seen in¶
- sources/2026-04-21-planetscale-query-performance-analysis-with-insights
— canonical wiki disclosure. Rafer Hazen, 2023-04-20,
canonicalises Insights'
reuse of the Vitess SQL parser for
AST-walk literal replacement + surface-syntax
normalisation. Worked example:
select * from users where id = 123normalises toselect * from users where id = ?, which hashes to the fingerprint.