CONCEPT

Kafka consumer backlog¶

Definition¶

Kafka consumer backlog (a.k.a. consumer lag) is the offset delta between a partition's latest produced message and the consumer group's most recently committed offset — the count of messages that have been published but not yet processed by the consumer. Backlog is the canonical downstream-throughput back- pressure signal for a Kafka-buffered pipeline: when the downstream consumer cannot keep up with the upstream producer, backlog grows.

Why it matters¶

A stable pipeline has backlog oscillating around zero — the consumer drains as fast as the producer writes. Growing backlog means one of:

Downstream storage bottleneck — the sink (database, blob store) is the binding constraint.
Consumer concurrency ceiling — all worker threads busy, waiting on downstream I/O.
Consumer-process count too low — not enough parallelism to match producer rate.
Upstream volume spike — transient; drains once volume normalises.

Category 1 is the most common in write-heavy pipelines: consumers are I/O-bound on the write path and cannot go faster without a substrate change.

Backlog as storage-latency proxy¶

Because the consumer's throughput is threads ÷ per-write latency when I/O-bound, any improvement in per-write latency directly reduces backlog. This is the chain PlanetScale observed in the 2025-03-11 Insights-to-Metal migration:

"This resulted in a lower average backlog in our Kafka consumers, and has given us additional capacity to handle increasing message volume in the future."

(Source: .)

The Insights pipeline — 32 consumers × 25 writer threads = 800 concurrent writer threads draining 10k writes/sec into 8 MySQL shards — had its per-write I/O latency collapse on the Metal substrate swap. Same thread count, faster draining, lower average backlog, and headroom for future volume growth on the same hardware. The backlog metric was the load-bearing operational signal that the upgrade worked.

Capacity-headroom corollary¶

A pipeline running at 100% drain-rate against its storage ceiling has no headroom — any upstream volume spike becomes permanent backlog. Dropping per-write latency converts the same hardware into headroom because max_throughput = threads ÷ per_write_latency rises with the latency drop. PlanetScale's "additional capacity to handle increasing message volume in the future" is this headroom-as-multiplier framing.

Relationship to backpressure¶

Kafka consumer backlog is an observability signal for the underlying backpressure that the downstream storage is imposing on the consumer. Unlike explicit backpressure mechanisms (token-bucket rate limiters, admission control), Kafka's unbounded partition log absorbs the mismatch silently — the producer keeps writing, the consumer falls further behind, and the only surfacing is backlog growth. Crossing a backlog threshold triggers alerting; operational response is either (a) scale consumers, (b) fix downstream throughput, or (c) drop messages.

The Metal migration is an instance of response (b) — substrate- level throughput fix via direct-attached NVMe.

Seen in¶

— canonical backlog-as-storage-latency-proxy disclosure. The Insights Kafka pipeline saw "lower average backlog in our Kafka consumers" after the EBS-to-Metal substrate swap — no application, architecture, or sharding changes, just lower per-write I/O latency. The backlog metric confirmed the downstream-storage-bound diagnosis and was the operational signal the migration landed correctly.
— the 2023-08-10 architectural sibling. Describes the Insights Kafka → MySQL pipeline: two topics (aggregate every 15s + slow-query events immediately), deterministic keying for partition affinity, bounded 5 MB in-memory buffer, async flush. Backlog is the implicit scalability budget the pipeline is engineered around.