PATTERN Cited by 1 source
Zero-copy sendfile at broker¶
Pattern¶
When a messaging broker ships records from the tail of its on-disk
log to a consumer's socket, use the OS's sendfile (or
equivalent zero-copy primitive) to have the kernel copy data
directly from pagecache into
the socket buffer, bypassing the application's heap entirely.
Canonical production instance: Apache Kafka broker serving consumer fetches. For this to work, the bytes on disk must be byte-identical to the wire format so the kernel can splice them with no transformation — which is exactly the invariant Kafka maintains: "Kafka stores messages in a standardized binary format unmodified throughout the whole flow (producer ➡ broker ➡ consumer)." Kozlovski, sources/2024-05-09-highscalability-kafka-101.
Why it wins (in theory)¶
- Zero user-space copy. Without
sendfile, the broker (JVM) readsNbytes from pagecache into JVM heap, then writesNbytes from JVM heap into the socket. That's two user/kernel crossings plus two memory copies;sendfilecollapses it to one kernel-space memcpy or DMA. - Reduced context switches. Fewer mode-switches at high request rates.
- JVM heap stays small. The broker doesn't need heap room to buffer the records it's about to ship; pagecache is already the buffer.
Kozlovski:
"Zero-copy, somewhat misleadingly named, is when the OS copies data from the pagecache directly to a socket, effectively bypassing Kafka's JVM entirely. There are still copies of the data being made — but they're reduced. This saves you a few extra copies and user <-> kernel mode switches." (Source: sources/2024-05-09-highscalability-kafka-101)
Why it's less load-bearing in practice¶
Kozlovski is explicit that this famous optimisation is overstated:
"While it sounds cool, it's unlikely the zero-copy plays a large role in optimizing Kafka due to two main reasons — first, CPU is rarely the bottleneck in well-optimized Kafka deployments, so the lack of in-memory copies doesn't buy you a lot of resources. Secondly, encryption and SSL/TLS (a must for all production deployments) already prohibit Kafka from using zero-copy due to modifying the message throughout its path. Despite this, Kafka still performs."
Two failure modes:
- TLS disables the
sendfilepath. The kernel would be splicing ciphertext — but the broker is the party doing the TLS encryption, and it has to run CPU through the bytes to encrypt them. The byte-identical-on-disk-and-wire invariant doesn't hold post-TLS. Production Kafka is TLS'd, sosendfileis off. - CPU isn't the bottleneck anyway. Well-tuned Kafka deployments bottleneck on the network, not the broker CPU. Saving CPU on the hot path doesn't unlock throughput.
Why it still gets included in the pattern catalogue¶
- Historical importance. The zero-copy framing was a major narrative beat in the original Kafka performance story and shaped the design of subsequent log-based systems.
- Non-TLS internal hops. Some operators run unencrypted broker-to-broker replication within a trust domain, where the optimisation still applies.
- Design invariant it forces. The pattern requires
byte-identical on-disk and wire formats. That discipline has
value even when
sendfileis off, because it means the broker never has to re-serialise records — the CPU work of serialisation-on-every-read is gone independently of whether the copy is in kernel or user space.
Seen in¶
- sources/2024-05-09-highscalability-kafka-101 — canonical wiki statement of the pattern + Kozlovski's honest-assessment caveat that it matters less than Kafka performance folklore suggests.
Related¶
- systems/kafka
- concepts/pagecache-for-messaging — the substrate the pattern reads from.
- concepts/zero-copy-sharing — adjacent zero-copy primitive.
- patterns/batch-over-network-to-broker — the write-side batching pattern that composes with pagecache on the read side.