Skip to content

PATTERN Cited by 1 source

AST-normalized query fingerprinting

Problem. An observability system needs to aggregate metrics across a stream of SQL executions where every execution has different literal values (id = 1, id = 2, id = 3), different casing, different whitespace. Per- execution metrics don't scale; regex-based literal stripping misses too many equivalences.

Solution. Produce the grouping key by parsing the SQL to an AST, walking the tree to replace literals with placeholders and normalise surface-syntax differences, then hashing the canonical rendering. Two executions share a fingerprint iff their ASTs are equivalent after this normalisation pass.

(Source: sources/2026-04-21-planetscale-query-performance-analysis-with-insights.)

Mechanism

  1. Own a SQL parser — or borrow one. PlanetScale reuses Vitess's query-serving-layer parser so the observability substrate and the query-execution substrate agree on parse semantics.
  2. Walk the AST, replacing literal nodes with a generic placeholder (?). Typical literal types: numeric, string, boolean, date, NULL, array literals.
  3. Normalise surface-syntax during the walk:
  4. Keyword case (SELECTselect).
  5. Identifier case (dialect-specific).
  6. Redundant parentheses.
  7. Whitespace collapsing.
  8. Trailing semicolons.
  9. Serialise the normalised AST back to SQL.
  10. Hash the canonical string — the hash is the query pattern fingerprint (concepts/query-pattern-fingerprint).

Why not regex?

Regex-based literal stripping (the common shortcut) breaks on quoting, parens, nested expressions, and keyword casing. The AST-walk approach round-trips through the grammar so all of these collapse by construction rather than by hand- rolled special cases.

PlanetScale's post calls this out explicitly: "Beyond literal extraction, Vitess's AST normalization also helps eliminate surface-level syntactic differences, such as casing differences or the presence of redundant parentheses."

Structural consequence: the fingerprint is the

aggregation axis

Once fingerprinting is stable:

All of these become cheap once the fingerprint collapses the high-cardinality query stream to a low-cardinality pattern stream.

Sharing the parser is a load-bearing choice

PlanetScale's substrate — Vitess's query planner — is already a SQL parser in the request path. Reusing it for telemetry normalisation means:

  • The observability substrate and the execution substrate agree on SQL grammar by construction.
  • Bug-for-bug compatibility — new SQL shapes supported by the planner are supported by fingerprinting too.
  • No duplicate parser to keep in sync.

Systems without a first-party parser on the request path have to pick between a shared-library parser (risk of grammar drift) or building one just for observability (duplicate maintenance burden).

Seen in

Last updated · 470 distilled / 1,213 read