CONCEPT Cited by 2 sources
Query pattern fingerprint¶
A query pattern fingerprint is a stable hash of a normalised SQL statement — the grouping key that tells an observability system "these two executions are the same pattern." Two SQL executions share a fingerprint iff they parse to ASTs that are identical after literal-replacement and surface-syntax normalisation.
(Source: sources/2026-04-21-planetscale-query-performance-analysis-with-insights.)
How it is built¶
PlanetScale's Insights builds the fingerprint via Vitess's SQL parser:
- Parse SQL → AST using Vitess's query serving layer.
- Walk the AST, replacing every literal (numbers, strings,
booleans) with a generic
?placeholder. Also normalise surface-level syntactic differences — "casing differences or the presence of redundant parentheses". - Render the normalised AST back to SQL as a canonical string.
- Hash the canonical string to produce the fingerprint.
Rafer Hazen, 2023-04-20: "we use Vitess's SQL parser to find a normalized pattern for every query. The Vitess query serving layer converts SQL into an abstract syntax tree, which we then walk to replace literals with generic placeholders."
Worked example:
select * from users where id = 123
↓ (literal replacement)
select * from users where id = ?
↓ (hash)
<fingerprint-hex>
Relationship to query digest¶
A query digest is the general category; a query pattern
fingerprint is PlanetScale's specific implementation via
Vitess AST normalisation. Key distinctions from MySQL's
native performance_schema digest:
| Axis | performance_schema digest |
Insights fingerprint |
|---|---|---|
| Normalisation engine | MySQL's internal digest | Vitess SQL parser |
| Placement | Per-MySQL-instance | Produced in VTGate / serving layer |
| Surface-syntax normalisation | Limited | Casing + redundant parens + AST-level equivalence |
| Storage | events_statements_summary_by_digest |
Kafka → ClickHouse (dual-stream) |
| Cardinality | Per-instance | Cluster-wide aggregate |
Why AST-walk > regex¶
Regex-based literal replacement can miss cases where two queries are semantically the same but lexically different:
SELECT * FROM users WHERE id=1vsselect * from users where id = 1(casing + whitespace).SELECT a, b FROM t WHERE (x = 1)vsSELECT a, b FROM t WHERE x = 1(redundant parens).
AST-walk normalisation gives these the same fingerprint because they round-trip to the same normalised AST.
Seen in¶
-
sources/2026-04-21-planetscale-storing-time-series-data-in-sharded-mysql-to-power-query-insights — Hazen, 2023-08-10, discloses the fingerprint's downstream role as the Kafka partition key (composed with
database_id):hash(database_id, fingerprint)is the Kafka key, yielding partition affinity per database × pattern. Canonical flush cadence: 15 seconds per fingerprint per VTGate. Per-15-second aggregate messages are merged in-memory per consumer batch (concepts/in-memory-coalescing-by-kafka-key) — a composition that relies on the fingerprint being stable across VTGate instances for the partition-affinity property to hold. This post is the pipeline-layer framing of the fingerprint's role; the 2023-04-20 sister post is the substrate-layer framing of how the fingerprint is produced. -
sources/2026-04-21-planetscale-query-performance-analysis-with-insights — Hazen canonicalises fingerprinting as the fundamental unit of Insights analysis and discloses Vitess's SQL parser as the normalisation substrate. Earliest wiki citation of PlanetScale's specific fingerprint mechanism.