PATTERN Cited by 2 sources

Batch over network to broker¶

Pattern¶

On the producer side of a messaging system, group many small records into one protocol batch before dispatching across the network. The receiving broker persists the batch as a single linear write, and downstream consumers fetch large contiguous chunks.

The goal is transport economics: amortise per-request overhead (TCP segments, broker bookkeeping, disk I/O syscalls) across many records.

Canonical production instance: Apache Kafka producer batching controlled by batch.size + linger.ms + max.in.flight.requests.per.connection.

Why it works¶

Kozlovski's Kafka-101 framing:

"Kafka's protocol groups messages together. This allows network requests to group messages together and reduce network overhead. The server, in turn, persists chunk of messages in one go — a linear HDD write. Consumers then fetch large linear chunks at once." (Source: sources/2024-05-09-highscalability-kafka-101)

Two amplifying properties:

Write side — the broker's write becomes a single linear HDD write, matching the sequential-I/O sweet spot (concepts/hdd-sequential-io-optimization).
Read side — consumers fetch large linear chunks that match pagecache prefetch boundaries.

The batch boundary therefore lines up with the OS's best-case I/O size on both the write and the read path.

Producer configuration¶

Kafka exposes three composable knobs:

batch.size — max bytes buffered per partition before dispatch. The byte-count trigger.
linger.ms — max wait time to accumulate a batch. The time trigger. linger.ms=0 means "batch only what is already available" (low latency, smaller batches); higher values trade latency for throughput.
max.in.flight.requests.per.connection — transport pipelining; how many batches can be in flight without response.

Taken together, these compose Kafka's producer-side batching primitive. It's byte + message-count + time-window within a partition.

What this pattern isn't¶

Batching by payload semantics — e.g., by total token count, by business-level grouping — is not what this pattern supports. Kafka batches by bytes/messages only, which is fine for transport economics but mismatched for application-specific batching. Payload-attribute batching is its own pattern (patterns/lightweight-aggregator-in-front-of-broker) and typically sits in front of Kafka rather than inside it — see sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference.

Trade-offs¶

Latency vs throughput — linger.ms is the explicit dial.
Batching amplifies blast radius on producer failure — a crashed producer loses the current in-memory batch (configurable via retries + idempotence).
Ordering per partition holds — batches within a partition are delivered in order; batches across partitions have no inter-partition ordering (as is true of Kafka in general).

Seen in¶

sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1 — canonical first-principles wiki statement of the pattern. James Kinley (Redpanda, 2024-11-19) frames batching from fixed + variable request-cost economics (concepts/fixed-vs-variable-request-cost), walks the trigger logic as pseudo-code, and canonicalises the seven-factor effective-batch-size framework (concepts/effective-batch-size). Kafka-API-compatible — applies identically to Kafka and Redpanda producers. Adds the counterintuitive findings that (a) saturated brokers inflate producer batches via max-in-flight backpressure (concepts/producer-backpressure-batch-growth), and (b) the sticky partitioner (concepts/sticky-partitioner) makes partition count invariant to effective batch size.
sources/2024-05-09-highscalability-kafka-101 — canonical wiki statement of the producer-batching-for-network-economics pattern in Kafka.

systems/kafka
systems/redpanda — Kafka-API-compatible; same batching semantics.
patterns/uniform-buffer-batching — generalised batching primitive.
patterns/lightweight-aggregator-in-front-of-broker — application-semantic batching layered in front of a transport-economics batcher.
concepts/pagecache-for-messaging — why consumer fetches served from pagecache compose with this pattern.
concepts/effective-batch-size — the seven-factor framework for what actually determines batch size in production.
concepts/fixed-vs-variable-request-cost — the substrate economics behind the pattern.
concepts/batching-latency-tradeoff — linger.ms / normal- vs-saturated regime.
concepts/sticky-partitioner — default Kafka-client partitioner behaviour.
concepts/producer-backpressure-batch-growth — saturation inflates producer batches past batch.size.