Skip to content

CONCEPT Cited by 2 sources

Producer backpressure batch growth

Definition

Producer backpressure batch growth is the counterintuitive behaviour of Kafka-API producers under broker saturation: when the broker is heavily loaded, producers grow their batches past the configured batch.size ceiling, not shrink them. Canonicalised by Redpanda's 2024-11-19 batch-tuning explainer:

"[Kafka] clients have an internal mechanism for tracking the in-flight batches sent to the cluster and awaiting acknowledgment. If Redpanda is heavily loaded, a client with all of the 'max in flight' slots in use will experience a form of backpressure, such that the client will continue to add records into a queued batch even beyond the maximum batch size. This will result in increased batch sizes while the cluster is heavily loaded." (Source: sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1)

Mechanism

The Kafka client's transport pipeline has a bounded number of in-flight request slots (default 5 per broker connection via max.in.flight.requests.per.connection). When all slots are occupied:

  1. The client cannot dispatch a new batch — no free slot.
  2. Records continue to arrive from the application.
  3. The client has to put them somewhere — the only option is to keep enqueueing them into the currently-open batch (or queue them into a closed-but-waiting batch via an overflow path).
  4. The open batch grows past batch.size because the size threshold's dispatch action ("close and send") is blocked by the missing slot.

This behaviour is protective: the alternative would be to either drop records or block the producer thread. Enqueueing past batch.size preserves the ordering and durability contract at the cost of inflating batches.

The pseudo-code decomposition

From Redpanda's explainer, the trigger logic branches on the in-flight-cap state:

if (client not at max-in-flight cap):
    # Normal regime: dispatch as soon as either threshold fires.
    if (current linger > linger.ms || next message would exceed batch.size):
        close_and_send_current_batch()
else:
    # Backpressure regime: can't dispatch; keep enqueueing
    # until next message would overflow current batch, then
    # queue up the closed batch for a future slot.
    if next message would exceed batch.size:
        close_and_enqueue_current_batch()

In the normal regime, batch.size is a ceiling. In the backpressure regime, batch.size is the close-and-queue threshold — after which the next batch opens and continues growing, also potentially past the ceiling.

Second-order effect: adding brokers can decrease batch size

Redpanda's explainer names the counterintuitive downstream:

"One consequence is that adding additional brokers to a loaded cluster can sometimes cause batch sizes to decrease since there is less backpressure."

The cycle: - Loaded cluster → produce responses delayed → max-in-flight slots stay occupied → producer queues past batch.size → batches are large (and batch.size is effectively not the cap). - Add brokers → cluster responds faster → slots free up → producer dispatches at batch.size again → batches shrink to the configured ceiling.

More cluster capacity leads to smaller batches (and therefore more request rate per unit of record rate). This is the opposite of the typical intuition that "adding capacity makes everything better"; in this case it shifts the bottleneck resource but leaves effective batching partially worse.

Why this is not an observability problem

A common debugging trap: engineers see batches larger than batch.size and conclude the setting isn't working. It is — the broker-side saturation is in command, not the producer-side threshold. The fix is at the broker (capacity) or at the pipeline (reduce producer rate), not at the producer config.

Relationship to classic backpressure

Classic backpressure is a slow-down signal — a slow consumer makes the fast producer stop. This is the inverse: the producer doesn't slow down, it grows its batches. The observable effect is the same (throughput adjusts to cluster capacity), but the mechanism is different:

  • Classic backpressure: producer is explicitly blocked waiting for capacity.
  • Kafka producer backpressure: producer continues accepting records but inflates batches because dispatch is stalled.

The client's ability to keep accepting records is bounded by buffer.memory — once total open-batch memory exceeds buffer.memory, classic blocking backpressure kicks in (producer thread blocks on send() or records get dropped).

Seen in

Last updated · 470 distilled / 1,213 read