Skip to content

CONCEPT Cited by 5 sources

Batching latency trade-off

Definition

The batching latency trade-off is the explicit exchange a producer makes when it groups records into batches: higher throughput is paid for with higher per-record latency. The linger.ms knob in Kafka/Redpanda clients is the tuning dial — linger.ms=0 means "dispatch as soon as you can; form batches only from whatever records happen to be available at dispatch time"; linger.ms=N means "wait up to N ms for more records to arrive before dispatching."

Redpanda's 2024-11-19 explainer frames both sides:

"Given that linger.ms specifies the maximum additional time to wait for a larger batch, it follows that the average latency would increase by half of that. Sometimes, messages are unlucky and arrive just as a new batch is created. And sometimes, messages are lucky and make it into a batch right before it's sent (and don't wait as long)." (Source: sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1)

Trigger logic

Canonicalised as pseudo-code from the Redpanda explainer:

if (client not at max-in-flight cap):
    if (current linger > linger.ms || next message would exceed batch.size):
        close_and_send_current_batch()
else:
    if next message would exceed batch.size:
        close_and_enqueue_current_batch()

Two thresholds compose:

  • Time trigger (linger.ms) — bounds the maximum latency added to a record from sitting in a batch waiting. Records that arrive first wait the longest (up to linger.ms); records that arrive last wait essentially zero. Average ~linger.ms / 2.
  • Size trigger (batch.size) — bounds the maximum bytes per batch. Under high-rate workloads, this is the dominant trigger — the batch fills before linger.ms elapses.

Whichever threshold fires first dispatches the batch.

Two regimes

The sign of the trade-off depends on whether the broker is CPU-saturated:

Normal regime (broker below saturation)

linger.ms=0 ⇒ minimum latency, smallest batches, highest request rate at the broker. linger.ms=5ms ⇒ ~2.5 ms added to average producer latency, larger batches, lower request rate at the broker, higher throughput per CPU.

The classic textbook throughput-vs-latency trade-off. Pick your point on the curve.

Saturated regime (broker CPU near ceiling, internal queue growing)

The Redpanda explainer's counterintuitive finding:

"As the CPU becomes saturated on a broker due to an extremely high request rate, tasks can stack up in the broker's internal work queue. At this point, the backlog of produce requests can impact other internal housekeeping operations as they contend for CPU time. When this backlog becomes high, a produce request will take longer to process because it has to wait for its turn in the queue — the cause of increased latency."

In this regime, more batching = less latency, because reducing request rate (by batching harder) reduces the broker's internal queue backlog and the queueing tax that dominates per-request latency:

"By tuning the effective batch size to reduce the request rate, you reduce the CPU impact, the average size of that backlog, and the latency impact of that backlog. This may seem counterintuitive. Modest adjustments of the producer linger time can ease the saturation enough to allow tail latencies to drop significantly."

The sign-flip happens because: - Under saturation, per-request latency = (request processing time) + (queueing wait time). - Queueing wait grows super-linearly with utilisation (classic queueing-theory M/M/1 result — wait time → ∞ as ρ → 1). - Reducing request rate (via linger.ms) moves utilisation away from the asymptote, shrinking wait time faster than the added producer-side linger grows it.

This composes with the broader tail-latency-at-scale framing but is a distinct mechanism: classic tail-latency-at-scale is about fan-out amplification and hedged requests; this is about reducing request rate to un-saturate a bottleneck resource.

Naturally-bursty traffic amplifies batching effectiveness

Redpanda's explainer also names a second-order effect:

"Messages aren't often distributed over time perfectly outside of synthetic benchmarks — in the real world, events often arrive in bursts, which means that even a modest linger.ms time is likely to perform better than expected."

In burst-arriving workloads, the linger window captures most of the burst even at small linger.ms, because inter-arrival times within a burst are sub-linger.ms. A benchmark that generates Poisson-distributed traffic understates the effectiveness of small-linger batching.

Defaults in practice

Client linger.ms default
Java KafkaProducer 0 ms (no batching)
librdkafka 5 ms
Other libraries Varies — "check the defaults for your preferred library."

The library-dependent defaults are a frequent source of production surprise: a team porting a pipeline from Java to a librdkafka-based language silently switches batching regime.

Seen in

Last updated · 470 distilled / 1,213 read