CONCEPT

AST literal normalization¶

AST literal normalization is the technique of producing a canonical form of a SQL statement by parsing it to an abstract syntax tree, walking the tree, and replacing every literal value with a generic placeholder. The normalised AST is rendered back to SQL as the canonical string used for grouping, fingerprinting, or cache-keying.

(Source: .)

Mechanism¶

Parse the SQL to an AST.
Walk the tree; for each literal node (integer, string, float, boolean, date), replace its value with a placeholder (? in most SQL-family grammars).
Optionally normalise other surface-level differences during the walk — casing of keywords/identifiers, redundant parentheses, whitespace collapsing, trailing-semicolon presence.
Serialise the normalised AST back to SQL.

Rafer Hazen, 2023-04-20: "The Vitess query serving layer converts SQL into an abstract syntax tree, which we then walk to replace literals with generic placeholders. Applied to the query above, we find the following normalized representation."

Worked example:

select * from users where id = 123
↓
select * from users where id = ?

"Beyond literal extraction, Vitess's AST normalization also helps eliminate surface-level syntactic differences, such as casing differences or the presence of redundant parentheses."

Why it beats regex-based stripping¶

Regex-based literal stripping breaks on:

Nested quoting ('it\'s a string').
Parenthesisation variance (WHERE x = 1 vs WHERE (x = 1)).
Keyword case (SELECT vs select).
Identifier quoting (`users` vs users).
Comment placement.

AST walking round-trips through a grammar-correct parser, so all of these collapse to the same canonical form without needing hand-rolled regex special cases.

Uses¶

Query fingerprinting for observability telemetry — concepts/query-pattern-fingerprint, systems/planetscale-insights.
Prepared-statement rewriting — turn a literal-bearing SQL into a parameterised prepared statement.
Query cache keying — two cosmetically different queries that are semantically the same produce the same cache key.
Schema-change impact analysis — walk query logs, normalise, and check whether any pattern references a column being dropped (patterns/warn-on-drop-recently-queried).

Seen in¶

— canonical wiki disclosure that PlanetScale Insights uses Vitess's SQL parser for AST-walk literal replacement as the fingerprint substrate. Post calls out both literal replacement and surface-syntax normalisation (casing, redundant parens) as in-scope for the walk.

AST literal normalization¶

Mechanism¶

Why it beats regex-based stripping¶

Uses¶

Seen in¶

Related¶