Skip to content

CONCEPT Cited by 2 sources

Query pattern fingerprint

A query pattern fingerprint is a stable hash of a normalised SQL statement — the grouping key that tells an observability system "these two executions are the same pattern." Two SQL executions share a fingerprint iff they parse to ASTs that are identical after literal-replacement and surface-syntax normalisation.

(Source: sources/2026-04-21-planetscale-query-performance-analysis-with-insights.)

How it is built

PlanetScale's Insights builds the fingerprint via Vitess's SQL parser:

  1. Parse SQL → AST using Vitess's query serving layer.
  2. Walk the AST, replacing every literal (numbers, strings, booleans) with a generic ? placeholder. Also normalise surface-level syntactic differences — "casing differences or the presence of redundant parentheses".
  3. Render the normalised AST back to SQL as a canonical string.
  4. Hash the canonical string to produce the fingerprint.

Rafer Hazen, 2023-04-20: "we use Vitess's SQL parser to find a normalized pattern for every query. The Vitess query serving layer converts SQL into an abstract syntax tree, which we then walk to replace literals with generic placeholders."

Worked example:

select * from users where id = 123
↓ (literal replacement)
select * from users where id = ?
↓ (hash)
<fingerprint-hex>

Relationship to query digest

A query digest is the general category; a query pattern fingerprint is PlanetScale's specific implementation via Vitess AST normalisation. Key distinctions from MySQL's native performance_schema digest:

Axis performance_schema digest Insights fingerprint
Normalisation engine MySQL's internal digest Vitess SQL parser
Placement Per-MySQL-instance Produced in VTGate / serving layer
Surface-syntax normalisation Limited Casing + redundant parens + AST-level equivalence
Storage events_statements_summary_by_digest Kafka → ClickHouse (dual-stream)
Cardinality Per-instance Cluster-wide aggregate

Why AST-walk > regex

Regex-based literal replacement can miss cases where two queries are semantically the same but lexically different:

  • SELECT * FROM users WHERE id=1 vs select * from users where id = 1 (casing + whitespace).
  • SELECT a, b FROM t WHERE (x = 1) vs SELECT a, b FROM t WHERE x = 1 (redundant parens).

AST-walk normalisation gives these the same fingerprint because they round-trip to the same normalised AST.

Seen in

  • sources/2026-04-21-planetscale-storing-time-series-data-in-sharded-mysql-to-power-query-insights — Hazen, 2023-08-10, discloses the fingerprint's downstream role as the Kafka partition key (composed with database_id): hash(database_id, fingerprint) is the Kafka key, yielding partition affinity per database × pattern. Canonical flush cadence: 15 seconds per fingerprint per VTGate. Per-15-second aggregate messages are merged in-memory per consumer batch (concepts/in-memory-coalescing-by-kafka-key) — a composition that relies on the fingerprint being stable across VTGate instances for the partition-affinity property to hold. This post is the pipeline-layer framing of the fingerprint's role; the 2023-04-20 sister post is the substrate-layer framing of how the fingerprint is produced.

  • sources/2026-04-21-planetscale-query-performance-analysis-with-insights — Hazen canonicalises fingerprinting as the fundamental unit of Insights analysis and discloses Vitess's SQL parser as the normalisation substrate. Earliest wiki citation of PlanetScale's specific fingerprint mechanism.

Last updated · 470 distilled / 1,213 read