CONCEPT

Query pattern fingerprint¶

A query pattern fingerprint is a stable hash of a normalised SQL statement — the grouping key that tells an observability system "these two executions are the same pattern." Two SQL executions share a fingerprint iff they parse to ASTs that are identical after literal-replacement and surface-syntax normalisation.

(Source: .)

How it is built¶

PlanetScale's Insights builds the fingerprint via Vitess's SQL parser:

Parse SQL → AST using Vitess's query serving layer.
Walk the AST, replacing every literal (numbers, strings, booleans) with a generic ? placeholder. Also normalise surface-level syntactic differences — "casing differences or the presence of redundant parentheses".
Render the normalised AST back to SQL as a canonical string.
Hash the canonical string to produce the fingerprint.

Rafer Hazen, 2023-04-20: "we use Vitess's SQL parser to find a normalized pattern for every query. The Vitess query serving layer converts SQL into an abstract syntax tree, which we then walk to replace literals with generic placeholders."

Worked example:

select * from users where id = 123
↓ (literal replacement)
select * from users where id = ?
↓ (hash)
<fingerprint-hex>

Relationship to query digest ¶

A query digest is the general category; a query pattern fingerprint is PlanetScale's specific implementation via Vitess AST normalisation. Key distinctions from MySQL's native performance_schema digest:

Axis	`performance_schema` digest	Insights fingerprint
Normalisation engine	MySQL's internal digest	Vitess SQL parser
Placement	Per-MySQL-instance	Produced in VTGate / serving layer
Surface-syntax normalisation	Limited	Casing + redundant parens + AST-level equivalence
Storage	`events_statements_summary_by_digest`	Kafka → ClickHouse (dual-stream)
Cardinality	Per-instance	Cluster-wide aggregate

Why AST-walk > regex¶

Regex-based literal replacement can miss cases where two queries are semantically the same but lexically different:

SELECT * FROM users WHERE id=1 vs select * from users where id = 1 (casing + whitespace).
SELECT a, b FROM t WHERE (x = 1) vs SELECT a, b FROM t WHERE x = 1 (redundant parens).

AST-walk normalisation gives these the same fingerprint because they round-trip to the same normalised AST.

Seen in¶

— Hazen, 2023-08-10, discloses the fingerprint's downstream role as the Kafka partition key (composed with database_id): hash(database_id, fingerprint) is the Kafka key, yielding partition affinity per database × pattern. Canonical flush cadence: 15 seconds per fingerprint per VTGate. Per-15-second aggregate messages are merged in-memory per consumer batch (concepts/in-memory-coalescing-by-kafka-key) — a composition that relies on the fingerprint being stable across VTGate instances for the partition-affinity property to hold. This post is the pipeline-layer framing of the fingerprint's role; the 2023-04-20 sister post is the substrate-layer framing of how the fingerprint is produced.
— Hazen canonicalises fingerprinting as the fundamental unit of Insights analysis and discloses Vitess's SQL parser as the normalisation substrate. Earliest wiki citation of PlanetScale's specific fingerprint mechanism.

Query pattern fingerprint¶

How it is built¶

Relationship to query digest¶

Why AST-walk > regex¶

Seen in¶

Related¶

Relationship to query digest ¶