PATTERN Cited by 1 source

Zero-copy sendfile at broker¶

Pattern¶

When a messaging broker ships records from the tail of its on-disk log to a consumer's socket, use the OS's sendfile (or equivalent zero-copy primitive) to have the kernel copy data directly from pagecache into the socket buffer, bypassing the application's heap entirely.

Canonical production instance: Apache Kafka broker serving consumer fetches. For this to work, the bytes on disk must be byte-identical to the wire format so the kernel can splice them with no transformation — which is exactly the invariant Kafka maintains: "Kafka stores messages in a standardized binary format unmodified throughout the whole flow (producer ➡ broker ➡ consumer)." Kozlovski, sources/2024-05-09-highscalability-kafka-101.

Why it wins (in theory)¶

Zero user-space copy. Without sendfile, the broker (JVM) reads N bytes from pagecache into JVM heap, then writes N bytes from JVM heap into the socket. That's two user/kernel crossings plus two memory copies; sendfile collapses it to one kernel-space memcpy or DMA.
Reduced context switches. Fewer mode-switches at high request rates.
JVM heap stays small. The broker doesn't need heap room to buffer the records it's about to ship; pagecache is already the buffer.

Kozlovski:

"Zero-copy, somewhat misleadingly named, is when the OS copies data from the pagecache directly to a socket, effectively bypassing Kafka's JVM entirely. There are still copies of the data being made — but they're reduced. This saves you a few extra copies and user <-> kernel mode switches." (Source: sources/2024-05-09-highscalability-kafka-101)

Why it's less load-bearing in practice¶

Kozlovski is explicit that this famous optimisation is overstated:

"While it sounds cool, it's unlikely the zero-copy plays a large role in optimizing Kafka due to two main reasons — first, CPU is rarely the bottleneck in well-optimized Kafka deployments, so the lack of in-memory copies doesn't buy you a lot of resources. Secondly, encryption and SSL/TLS (a must for all production deployments) already prohibit Kafka from using zero-copy due to modifying the message throughout its path. Despite this, Kafka still performs."

Two failure modes:

TLS disables the sendfile path. The kernel would be splicing ciphertext — but the broker is the party doing the TLS encryption, and it has to run CPU through the bytes to encrypt them. The byte-identical-on-disk-and-wire invariant doesn't hold post-TLS. Production Kafka is TLS'd, so sendfile is off.
CPU isn't the bottleneck anyway. Well-tuned Kafka deployments bottleneck on the network, not the broker CPU. Saving CPU on the hot path doesn't unlock throughput.

Why it still gets included in the pattern catalogue¶

Historical importance. The zero-copy framing was a major narrative beat in the original Kafka performance story and shaped the design of subsequent log-based systems.
Non-TLS internal hops. Some operators run unencrypted broker-to-broker replication within a trust domain, where the optimisation still applies.
Design invariant it forces. The pattern requires byte-identical on-disk and wire formats. That discipline has value even when sendfile is off, because it means the broker never has to re-serialise records — the CPU work of serialisation-on-every-read is gone independently of whether the copy is in kernel or user space.

Seen in¶

sources/2024-05-09-highscalability-kafka-101 — canonical wiki statement of the pattern + Kozlovski's honest-assessment caveat that it matters less than Kafka performance folklore suggests.

systems/kafka
concepts/pagecache-for-messaging — the substrate the pattern reads from.
concepts/zero-copy-sharing — adjacent zero-copy primitive.
patterns/batch-over-network-to-broker — the write-side batching pattern that composes with pagecache on the read side.