CONCEPT Cited by 5 sources
Batching latency trade-off¶
Definition¶
The batching latency trade-off is the explicit exchange a
producer makes when it groups records into batches: higher
throughput is paid for with higher per-record latency. The
linger.ms knob in Kafka/Redpanda clients is the tuning dial —
linger.ms=0 means "dispatch as soon as you can; form batches
only from whatever records happen to be available at dispatch
time"; linger.ms=N means "wait up to N ms for more records to
arrive before dispatching."
Redpanda's 2024-11-19 explainer frames both sides:
"Given that
linger.msspecifies the maximum additional time to wait for a larger batch, it follows that the average latency would increase by half of that. Sometimes, messages are unlucky and arrive just as a new batch is created. And sometimes, messages are lucky and make it into a batch right before it's sent (and don't wait as long)." (Source: sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1)
Trigger logic¶
Canonicalised as pseudo-code from the Redpanda explainer:
if (client not at max-in-flight cap):
if (current linger > linger.ms || next message would exceed batch.size):
close_and_send_current_batch()
else:
if next message would exceed batch.size:
close_and_enqueue_current_batch()
Two thresholds compose:
- Time trigger (
linger.ms) — bounds the maximum latency added to a record from sitting in a batch waiting. Records that arrive first wait the longest (up tolinger.ms); records that arrive last wait essentially zero. Average ~linger.ms / 2. - Size trigger (
batch.size) — bounds the maximum bytes per batch. Under high-rate workloads, this is the dominant trigger — the batch fills beforelinger.mselapses.
Whichever threshold fires first dispatches the batch.
Two regimes¶
The sign of the trade-off depends on whether the broker is CPU-saturated:
Normal regime (broker below saturation)¶
linger.ms=0 ⇒ minimum latency, smallest batches, highest request
rate at the broker.
linger.ms=5ms ⇒ ~2.5 ms added to average producer latency,
larger batches, lower request rate at the broker, higher
throughput per CPU.
The classic textbook throughput-vs-latency trade-off. Pick your point on the curve.
Saturated regime (broker CPU near ceiling, internal queue growing)¶
The Redpanda explainer's counterintuitive finding:
"As the CPU becomes saturated on a broker due to an extremely high request rate, tasks can stack up in the broker's internal work queue. At this point, the backlog of produce requests can impact other internal housekeeping operations as they contend for CPU time. When this backlog becomes high, a produce request will take longer to process because it has to wait for its turn in the queue — the cause of increased latency."
In this regime, more batching = less latency, because reducing request rate (by batching harder) reduces the broker's internal queue backlog and the queueing tax that dominates per-request latency:
"By tuning the effective batch size to reduce the request rate, you reduce the CPU impact, the average size of that backlog, and the latency impact of that backlog. This may seem counterintuitive. Modest adjustments of the producer linger time can ease the saturation enough to allow tail latencies to drop significantly."
The sign-flip happens because:
- Under saturation, per-request latency = (request processing
time) + (queueing wait time).
- Queueing wait grows super-linearly with utilisation (classic
queueing-theory M/M/1 result — wait time → ∞ as ρ → 1).
- Reducing request rate (via linger.ms) moves utilisation away
from the asymptote, shrinking wait time faster than the added
producer-side linger grows it.
This composes with the broader tail-latency-at-scale framing but is a distinct mechanism: classic tail-latency-at-scale is about fan-out amplification and hedged requests; this is about reducing request rate to un-saturate a bottleneck resource.
Naturally-bursty traffic amplifies batching effectiveness¶
Redpanda's explainer also names a second-order effect:
"Messages aren't often distributed over time perfectly outside of synthetic benchmarks — in the real world, events often arrive in bursts, which means that even a modest
linger.mstime is likely to perform better than expected."
In burst-arriving workloads, the linger window captures most of
the burst even at small linger.ms, because inter-arrival times
within a burst are sub-linger.ms. A benchmark that generates
Poisson-distributed traffic understates the effectiveness of
small-linger batching.
Defaults in practice¶
| Client | linger.ms default |
|---|---|
Java KafkaProducer |
0 ms (no batching) |
| librdkafka | 5 ms |
| Other libraries | Varies — "check the defaults for your preferred library." |
The library-dependent defaults are a frequent source of production surprise: a team porting a pipeline from Java to a librdkafka-based language silently switches batching regime.
Seen in¶
- sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming
— extends the trade-off at the broker→analytical-sink
altitude. Redpanda Connect's
snowflake_streamingoutput in the 14.5 GB/s benchmark found that count-based batch triggers outperform byte-size-based triggers on the hot produce path, becausebyte_sizeenforcement requires per-message serialised-size computation whereas count is an integer compare. Canonical new pattern: patterns/count-over-bytesize-batch-trigger. Also reaffirms batching-feeds-analytical-commit latency attribution — 86% of the 7.49 s P99 end-to-end lived in the Snowflake commit path, with batch-trigger choice the first lever. - sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda
— reaffirms the batching trade-off in checklist voice:
"Batching does mean intentionally introducing latency into
the produce pipeline, but it's often a worthy tradeoff and
can lead to lower latency overall since the broker is more
efficient." Recommends raising
linger.ms+ max batch size; monitor per-topic average batch size + batch produce rate. - sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1 — canonical wiki statement of the normal-vs-saturated regime trade-off and the pseudo-code trigger logic.
- sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2
— quantitative validation of the saturation-regime latency
inversion. Three-round linger tuning on a CPU-saturated
Redpanda Cloud BYOC cluster: p50 25 ms → < 1 ms, p99 128 ms →
17 ms, p99.999 490 ms → 130 ms. Every percentile improved at
every round — the monotonic improvement across all percentiles
is the signature of saturated-regime tuning. Also canonicalises
scheduler queue length (
vectorized_scheduler_queue_length) as the detector for which regime the cluster is in. - sources/2026-03-30-redpanda-under-the-hood-redpanda-cloud-topics-architecture — the same trade-off one altitude down: the Cloud Topics Subsystem batches records in memory across all partitions and topics before flushing a single PUT to object storage, with trigger "e.g., 0.25 seconds or 4 MB". This is count-or- bytesize at the broker-to-object-storage layer. The 250 ms window puts a floor on produce p99 latency for Cloud Topics — the feature is positioned for latency-tolerant workloads where this is acceptable.
Related¶
- systems/kafka, systems/redpanda — Kafka-API producers.
- patterns/batch-over-network-to-broker — the producer-side pattern.
- concepts/effective-batch-size — why the config knobs aren't enough.
- concepts/fixed-vs-variable-request-cost — the substrate economics that make batching matter.
- concepts/producer-backpressure-batch-growth — the adjacent counterintuitive finding: broker backpressure makes batches larger than the configured ceiling.
- concepts/tail-latency-at-scale — adjacent tail-latency framing; composes with this.