PATTERN

Handler-hook sidecar telemetry¶

Problem. You want per-query telemetry from a production database — the "what actually happened during this single execution" granularity — at 100% coverage, without extra round-trips, without sampling, without significant per-query overhead, and without polling external counter tables.

Solution. Intercept at the storage-engine handler (or equivalent narrow-waist abstraction) to capture the telemetry datum during query execution. Store it in a per-query data structure attached to the server thread's context. When the query completes, bolt the datum onto the existing wire-protocol response packet as a sidecar field. Let the proxy / client layer parse and aggregate it downstream.

Canonicalised by PlanetScale's Insights for per-query index-usage telemetry: hook InnoDB's index_init() callback, accumulate the per-query used-index set in the thread-local THD context, emit it in the final MySQL wire-protocol packet, let VTGate aggregate per fingerprint and ship to the telemetry pipeline every 15 seconds. (Source: .)

The three-stage shape¶

┌─────────────────┐    handler hook     ┌─────────────────┐
│ Storage engine  │────── fires ──────▶ │ Per-query       │
│ (e.g. InnoDB)   │  (index_init, etc)  │ data structure  │
└─────────────────┘                     │ on thread ctx   │
                                        └────────┬────────┘
                                                 │ on query end
                                                 ▼
┌─────────────────┐    sidecar field    ┌─────────────────┐
│ Client / proxy  │◀────  wire packet  ─│ Response packet │
│ (e.g. VTGate)   │   (final packet)    │ (final or trailer)│
└────────┬────────┘                     └─────────────────┘
         │ aggregate per fingerprint
         ▼
┌─────────────────┐
│ Telemetry pipe  │
│ (Kafka, ...)    │
└─────────────────┘

Why this shape¶

The composition gives you properties individually unachievable by the two natural alternatives:

Approach	Coverage	Overhead	Latency	Integration
External polling (e.g. `performance_schema` counters via a scraper)	Sampled, cumulative	Scraper cost + counter overhead	Seconds to minutes (scrape interval)	Works against unmodified server
Sidecar trace exporter (OpenTelemetry-style, per-query span)	100% or sampled	Per-query export call	Sub-second	Requires serialiser + sidecar endpoint
Handler-hook + wire-protocol sidecar (this pattern)	100%, per-execution	Zero extra round-trips	Zero (rides on result packet)	Requires engine fork

The cost is that you own a fork of the server (or an engine). The benefit is that your telemetry has the finest possible granularity and the lowest possible overhead.

When to use it¶

You already own the server / storage-engine distribution (PlanetScale's MySQL fork, Vitess's engine integration, a proprietary engine you ship).
You need per-query datum granularity — set membership, resource accounting, per-request identity — that summary counters cannot provide.
You care about no extra round-trips — telemetry piggybacks on the existing response flow.
The datum is small (tens to hundreds of bytes), so the sidecar doesn't materially bloat response sizes.

When not to use it¶

You don't own the server distribution. Upstream MySQL / Postgres / etc do not expose a handler-hook stable API for this pattern. Patching downstream couples your telemetry to your fork's maintenance burden.
The datum is large — streaming row-level attribution data in every response packet is a bandwidth tax.
You need write-path coverage and the response- packet format doesn't have a good slot for the sidecar on write responses. (This is why PlanetScale's Insights index-usage telemetry is SELECT-only — the wire-protocol sidecar rides in result-set packets, and UPDATE / DELETE responses have different framing.)
The hook has high firing frequency — per-row hooks are almost always too hot to decorate with per-query-data-structure writes.

Async Kafka publication for telemetry (concepts/async-kafka-publication-for-telemetry) — the downstream complement: VTGate doesn't synchronously emit to Kafka on every query; it buffers to an in-memory queue and flushes asynchronously. Handler-hook-sidecar captures cheaply on the hot path; async-publish ships off the hot path.
Per-pattern time series (concepts/per-pattern-time-series) — the aggregation substrate on top of the sidecar data. The sidecar gives per-execution data; the time-series surface rolls up to per-pattern per-interval.

Caveats¶

Fork maintenance burden — patches against internal storage-handler APIs can break on upstream version bumps. Rebase discipline required.
ABI fragility — handler APIs are semi-private; signature drift between upstream MySQL / MariaDB / Percona forks means patches are fork-specific.
Wire-protocol bloat — payload must be bounded and small. Large per-query telemetry sidecars violate the "zero-cost piggyback" premise of the pattern.
Client-side parsing — every client / proxy that reads server responses needs to be aware of the sidecar field. Non-aware clients ignore it; non-aware intermediaries may choke if the field disrupts wire-protocol parsing. Implementation discipline: embed the sidecar in a field that unaware clients gracefully ignore (e.g. a post-status OK-packet extension on the tail).
SELECT-only coverage in PlanetScale's implementation — the wire-protocol sidecar rides in the final packet of result-set queries. UPDATE / DELETE paths that return OK_Packet with affected_rows instead of a result set would need a different sidecar slot — not implemented in Insights as of 2024-08-14.

Seen in¶

— Rafer Hazen (2024-08-14) canonicalises the pattern via PlanetScale's index-usage-tracking feature in Insights. Hook: InnoDB's index_init(). Per-query structure: appended index names attached to the server thread context. Sidecar transport: "we return the list of used indexes in the final packet returned by MySQL to the client, and ultimately to VTGate, Vitess's query proxying layer." Downstream aggregation: VTGate aggregates per fingerprint and emits every 15 seconds into the Insights Kafka pipeline. Load-bearing result: "aggregate the time series count of indexes used for 100% of queries with negligible overhead in MySQL."