Skip to content

PATTERN Cited by 1 source

Handler-hook sidecar telemetry

Problem. You want per-query telemetry from a production database — the "what actually happened during this single execution" granularity — at 100% coverage, without extra round-trips, without sampling, without significant per-query overhead, and without polling external counter tables.

Solution. Intercept at the storage-engine handler (or equivalent narrow-waist abstraction) to capture the telemetry datum during query execution. Store it in a per-query data structure attached to the server thread's context. When the query completes, bolt the datum onto the existing wire-protocol response packet as a sidecar field. Let the proxy / client layer parse and aggregate it downstream.

Canonicalised by PlanetScale's Insights for per-query index-usage telemetry: hook InnoDB's index_init() callback, accumulate the per-query used-index set in the thread-local THD context, emit it in the final MySQL wire-protocol packet, let VTGate aggregate per fingerprint and ship to the telemetry pipeline every 15 seconds. (Source: sources/2026-04-21-planetscale-tracking-index-usage-with-insights.)

The three-stage shape

┌─────────────────┐    handler hook     ┌─────────────────┐
│ Storage engine  │────── fires ──────▶ │ Per-query       │
│ (e.g. InnoDB)   │  (index_init, etc)  │ data structure  │
└─────────────────┘                     │ on thread ctx   │
                                        └────────┬────────┘
                                                 │ on query end
┌─────────────────┐    sidecar field    ┌─────────────────┐
│ Client / proxy  │◀────  wire packet  ─│ Response packet │
│ (e.g. VTGate)   │   (final packet)    │ (final or trailer)│
└────────┬────────┘                     └─────────────────┘
         │ aggregate per fingerprint
┌─────────────────┐
│ Telemetry pipe  │
│ (Kafka, ...)    │
└─────────────────┘

Why this shape

The composition gives you properties individually unachievable by the two natural alternatives:

Approach Coverage Overhead Latency Integration
External polling (e.g. performance_schema counters via a scraper) Sampled, cumulative Scraper cost + counter overhead Seconds to minutes (scrape interval) Works against unmodified server
Sidecar trace exporter (OpenTelemetry-style, per-query span) 100% or sampled Per-query export call Sub-second Requires serialiser + sidecar endpoint
Handler-hook + wire-protocol sidecar (this pattern) 100%, per-execution Zero extra round-trips Zero (rides on result packet) Requires engine fork

The cost is that you own a fork of the server (or an engine). The benefit is that your telemetry has the finest possible granularity and the lowest possible overhead.

When to use it

  • You already own the server / storage-engine distribution (PlanetScale's MySQL fork, Vitess's engine integration, a proprietary engine you ship).
  • You need per-query datum granularity — set membership, resource accounting, per-request identity — that summary counters cannot provide.
  • You care about no extra round-trips — telemetry piggybacks on the existing response flow.
  • The datum is small (tens to hundreds of bytes), so the sidecar doesn't materially bloat response sizes.

When not to use it

  • You don't own the server distribution. Upstream MySQL / Postgres / etc do not expose a handler-hook stable API for this pattern. Patching downstream couples your telemetry to your fork's maintenance burden.
  • The datum is large — streaming row-level attribution data in every response packet is a bandwidth tax.
  • You need write-path coverage and the response- packet format doesn't have a good slot for the sidecar on write responses. (This is why PlanetScale's Insights index-usage telemetry is SELECT-only — the wire-protocol sidecar rides in result-set packets, and UPDATE / DELETE responses have different framing.)
  • The hook has high firing frequency — per-row hooks are almost always too hot to decorate with per-query-data-structure writes.
  • Async Kafka publication for telemetry (concepts/async-kafka-publication-for-telemetry) — the downstream complement: VTGate doesn't synchronously emit to Kafka on every query; it buffers to an in-memory queue and flushes asynchronously. Handler-hook-sidecar captures cheaply on the hot path; async-publish ships off the hot path.
  • Per-pattern time series (concepts/per-pattern-time-series) — the aggregation substrate on top of the sidecar data. The sidecar gives per-execution data; the time-series surface rolls up to per-pattern per-interval.

Caveats

  • Fork maintenance burden — patches against internal storage-handler APIs can break on upstream version bumps. Rebase discipline required.
  • ABI fragility — handler APIs are semi-private; signature drift between upstream MySQL / MariaDB / Percona forks means patches are fork-specific.
  • Wire-protocol bloat — payload must be bounded and small. Large per-query telemetry sidecars violate the "zero-cost piggyback" premise of the pattern.
  • Client-side parsing — every client / proxy that reads server responses needs to be aware of the sidecar field. Non-aware clients ignore it; non-aware intermediaries may choke if the field disrupts wire-protocol parsing. Implementation discipline: embed the sidecar in a field that unaware clients gracefully ignore (e.g. a post-status OK-packet extension on the tail).
  • SELECT-only coverage in PlanetScale's implementation — the wire-protocol sidecar rides in the final packet of result-set queries. UPDATE / DELETE paths that return OK_Packet with affected_rows instead of a result set would need a different sidecar slot — not implemented in Insights as of 2024-08-14.

Seen in

  • sources/2026-04-21-planetscale-tracking-index-usage-with-insights — Rafer Hazen (2024-08-14) canonicalises the pattern via PlanetScale's index-usage-tracking feature in Insights. Hook: InnoDB's index_init(). Per-query structure: appended index names attached to the server thread context. Sidecar transport: "we return the list of used indexes in the final packet returned by MySQL to the client, and ultimately to VTGate, Vitess's query proxying layer." Downstream aggregation: VTGate aggregates per fingerprint and emits every 15 seconds into the Insights Kafka pipeline. Load-bearing result: "aggregate the time series count of indexes used for 100% of queries with negligible overhead in MySQL."
Last updated · 470 distilled / 1,213 read