Skip to content

PATTERN Cited by 1 source

Zero-copy sendfile at broker

Pattern

When a messaging broker ships records from the tail of its on-disk log to a consumer's socket, use the OS's sendfile (or equivalent zero-copy primitive) to have the kernel copy data directly from pagecache into the socket buffer, bypassing the application's heap entirely.

Canonical production instance: Apache Kafka broker serving consumer fetches. For this to work, the bytes on disk must be byte-identical to the wire format so the kernel can splice them with no transformation — which is exactly the invariant Kafka maintains: "Kafka stores messages in a standardized binary format unmodified throughout the whole flow (producer ➡ broker ➡ consumer)." Kozlovski, sources/2024-05-09-highscalability-kafka-101.

Why it wins (in theory)

  • Zero user-space copy. Without sendfile, the broker (JVM) reads N bytes from pagecache into JVM heap, then writes N bytes from JVM heap into the socket. That's two user/kernel crossings plus two memory copies; sendfile collapses it to one kernel-space memcpy or DMA.
  • Reduced context switches. Fewer mode-switches at high request rates.
  • JVM heap stays small. The broker doesn't need heap room to buffer the records it's about to ship; pagecache is already the buffer.

Kozlovski:

"Zero-copy, somewhat misleadingly named, is when the OS copies data from the pagecache directly to a socket, effectively bypassing Kafka's JVM entirely. There are still copies of the data being made — but they're reduced. This saves you a few extra copies and user <-> kernel mode switches." (Source: sources/2024-05-09-highscalability-kafka-101)

Why it's less load-bearing in practice

Kozlovski is explicit that this famous optimisation is overstated:

"While it sounds cool, it's unlikely the zero-copy plays a large role in optimizing Kafka due to two main reasons — first, CPU is rarely the bottleneck in well-optimized Kafka deployments, so the lack of in-memory copies doesn't buy you a lot of resources. Secondly, encryption and SSL/TLS (a must for all production deployments) already prohibit Kafka from using zero-copy due to modifying the message throughout its path. Despite this, Kafka still performs."

Two failure modes:

  • TLS disables the sendfile path. The kernel would be splicing ciphertext — but the broker is the party doing the TLS encryption, and it has to run CPU through the bytes to encrypt them. The byte-identical-on-disk-and-wire invariant doesn't hold post-TLS. Production Kafka is TLS'd, so sendfile is off.
  • CPU isn't the bottleneck anyway. Well-tuned Kafka deployments bottleneck on the network, not the broker CPU. Saving CPU on the hot path doesn't unlock throughput.

Why it still gets included in the pattern catalogue

  • Historical importance. The zero-copy framing was a major narrative beat in the original Kafka performance story and shaped the design of subsequent log-based systems.
  • Non-TLS internal hops. Some operators run unencrypted broker-to-broker replication within a trust domain, where the optimisation still applies.
  • Design invariant it forces. The pattern requires byte-identical on-disk and wire formats. That discipline has value even when sendfile is off, because it means the broker never has to re-serialise records — the CPU work of serialisation-on-every-read is gone independently of whether the copy is in kernel or user space.

Seen in

Last updated · 319 distilled / 1,201 read