PATTERN Cited by 1 source
Batch over network to broker¶
Pattern¶
On the producer side of a messaging system, group many small records into one protocol batch before dispatching across the network. The receiving broker persists the batch as a single linear write, and downstream consumers fetch large contiguous chunks.
The goal is transport economics: amortise per-request overhead (TCP segments, broker bookkeeping, disk I/O syscalls) across many records.
Canonical production instance: Apache Kafka
producer batching controlled by batch.size + linger.ms +
max.in.flight.requests.per.connection.
Why it works¶
Kozlovski's Kafka-101 framing:
"Kafka's protocol groups messages together. This allows network requests to group messages together and reduce network overhead. The server, in turn, persists chunk of messages in one go — a linear HDD write. Consumers then fetch large linear chunks at once." (Source: sources/2024-05-09-highscalability-kafka-101)
Two amplifying properties:
- Write side — the broker's write becomes a single linear HDD write, matching the sequential-I/O sweet spot (concepts/hdd-sequential-io-optimization).
- Read side — consumers fetch large linear chunks that match pagecache prefetch boundaries.
The batch boundary therefore lines up with the OS's best-case I/O size on both the write and the read path.
Producer configuration¶
Kafka exposes three composable knobs:
batch.size— max bytes buffered per partition before dispatch. The byte-count trigger.linger.ms— max wait time to accumulate a batch. The time trigger.linger.ms=0means "batch only what is already available" (low latency, smaller batches); higher values trade latency for throughput.max.in.flight.requests.per.connection— transport pipelining; how many batches can be in flight without response.
Taken together, these compose Kafka's producer-side batching primitive. It's byte + message-count + time-window within a partition.
What this pattern isn't¶
Batching by payload semantics — e.g., by total token count, by business-level grouping — is not what this pattern supports. Kafka batches by bytes/messages only, which is fine for transport economics but mismatched for application-specific batching. Payload-attribute batching is its own pattern (patterns/lightweight-aggregator-in-front-of-broker) and typically sits in front of Kafka rather than inside it — see sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference.
Trade-offs¶
- Latency vs throughput —
linger.msis the explicit dial. - Batching amplifies blast radius on producer failure — a crashed producer loses the current in-memory batch (configurable via retries + idempotence).
- Ordering per partition holds — batches within a partition are delivered in order; batches across partitions have no inter-partition ordering (as is true of Kafka in general).
Seen in¶
- sources/2024-05-09-highscalability-kafka-101 — canonical wiki statement of the producer-batching-for-network-economics pattern in Kafka.
Related¶
- systems/kafka
- patterns/uniform-buffer-batching — generalised batching primitive.
- patterns/lightweight-aggregator-in-front-of-broker — application-semantic batching layered in front of a transport-economics batcher.
- concepts/pagecache-for-messaging — why consumer fetches served from pagecache compose with this pattern.