CONCEPT Cited by 1 source
Async Kafka publication for telemetry¶
Async Kafka publication for telemetry is the pattern where a hot-path producer (query serving layer, RPC handler, application thread) writes telemetry to a bounded in-memory buffer and an independent flusher thread drains the buffer to Kafka asynchronously. The hot path never blocks on Kafka availability; the buffer absorbs transient Kafka outages up to its capacity; observations beyond the capacity are dropped.
(Source: sources/2026-04-21-planetscale-storing-time-series-data-in-sharded-mysql-to-power-query-insights.)
Why async for telemetry¶
The three load-bearing properties:
- Per-query overhead must be negligible. A hot-path producer emitting one telemetry record per query cannot wait on a broker RTT — even a 1 ms synchronous round trip on every query adds percent-level overhead on queries that already run in a few milliseconds.
- Telemetry must not cause unavailability of the thing it is observing. If Kafka is down and the producer blocks, the observed system is down with it. Telemetry is a nice-to-have; serving is a must-have. Availability-over- data-completeness is the correct trade-off for this axis.
- Transient Kafka unavailability is expected. Kafka clusters reboot, elect leaders, rebalance partitions. The buffer must absorb seconds-to-minutes of producer-side disconnects without losing observations that happened during the gap.
Canonical implementation: bounded memory buffer + async¶
flush
Rafer Hazen, PlanetScale, 2023-08-10: "Data submitted in VTGate is published to a bounded memory buffer and flushed to Kafka asynchronously. Asynchronous publication minimizes per-query overhead and ensures we continue to serve queries even during a Kafka outage. We guard against a temporary Kafka unavailability by buffering up to 5MB, which will be sent when Kafka becomes available again."
Mechanism:
- Hot path appends a serialised record to a bounded in-memory buffer (ring buffer / blocking queue / lock-free SPSC).
- Independent flusher thread reads from the buffer and writes batched records to Kafka.
- When the buffer is full, producers drop the record rather than block (or apply backpressure selectively — the choice is product-dependent).
- PlanetScale's canonical VTGate-side budget is 5 MB per VTGate.
Size of the buffer¶
The 5 MB figure is an operator choice balancing:
- Too small → records drop during ordinary Kafka hiccups (leader election, network blip, broker restart).
- Too large → process-memory pressure on the hot-path-producer process; stale observations (if Kafka is down for minutes, what you eventually flush is minutes old and no longer operationally useful).
PlanetScale sizes the buffer to absorb a typical Kafka blip (seconds to low minutes) without eating meaningful VTGate memory.
Composition: drops under failure are visible¶
If the hot path drops telemetry when the buffer is full, that drop is itself an observable event: the number of dropped telemetry records should be a metric (ironically, emitted to Prometheus or a second-path). This closes the loop: operators can see when Kafka was unavailable long enough for the buffer to fill, even if the individual dropped telemetry records are gone.
Contrast: sync publish¶
The synchronous alternative — producers write records to Kafka directly on the hot path and wait for ack — has all three failure modes: adds RTT overhead per query, makes the observed system unavailable during Kafka outages, and provides no transient-outage absorption.
Async-plus-buffer is the canonical tradeoff for any observability-grade producer.
Seen in¶
- sources/2026-04-21-planetscale-storing-time-series-data-in-sharded-mysql-to-power-query-insights — canonical wiki disclosure. Hazen canonicalises VTGate's 5 MB per-instance async buffer as a primary design goal: "the instrumentation should never slow your database down or cause unavailability."