PATTERN Cited by 2 sources
Broker write caching as client-tuning substitute¶
Problem¶
A Kafka-API streaming cluster is CPU-saturated because producer batches are too small, but the producers aren't tunable:
- The producer fleet is owned by a different team on a different
release cadence — coordinating
linger.ms/batch.sizeadjustments is slow or impossible. - Producers span multiple languages / clients (Java, librdkafka-based Python, Go, Rust) with inconsistent defaults — a uniform policy can't be applied.
- Topic / partition layout is locked by a downstream ordering contract — restructuring to reduce partition fan-out isn't an option.
- Producers are third-party (external customers of a data-platform service) and cannot be reached at all.
The CPU-saturation pain is real; producer-side fix isn't.
Solution¶
Enable broker-side write caching (Redpanda) or rely on Kafka's OS buffer-cache default (legacy Kafka). The broker coalesces many small in-memory writes into larger disk flushes in the background, reclaiming the batching economics the producers aren't providing — at the cost of a bounded durability relaxation (ack-on-memory at quorum, flush to disk in background).
The equivalence frame, verbatim from Redpanda's 2024-11-26 part 2 (Source: sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2):
"Write caching is a mechanism Redpanda supports that helps alleviate the broker-side issue from having many tiny batches (or single message batches). This is especially useful for cases where your architecture makes it hard to do client-side tuning, change the producer behavior, or adjust topic and partition design."
And the durability equivalence:
"When write caching is enabled in Redpanda, the data durability guarantees are relaxed but no worse than a legacy Kafka cluster."
Mechanism summary¶
Default: Client → broker → disk → ack
With write caching: Client → broker → memory → ack
→ (background) flush large block → disk
See concepts/broker-write-caching for the full state machine.
When to prefer this over client-side tuning¶
| Signal | Prefer client-side tuning | Prefer broker write caching |
|---|---|---|
| Producer ownership | Same team | Other team / third-party |
| Client-library diversity | Single library | Multi-language fleet |
| Partition layout flexibility | Modifiable | Frozen by contract |
| Durability strictness | Disk-fsync required | Legacy-Kafka durability acceptable |
| Tuning iteration speed | Fast (single-team deploys) | Slow (cross-team coordination) |
The decision is primarily organisational, not technical. The technical floor (Kafka-legacy-durability equivalence) is acceptable for the vast majority of streaming workloads.
When not to use¶
- Workloads with hard synchronous-durability SLAs (financial systems requiring disk-fsync-before-ack, mission-critical append logs). Kafka-legacy durability does not survive simultaneous leader + follower-quorum memory loss.
- Workloads where the producer fleet is tunable — prefer iterative linger tuning because producer-side fixes reduce network and CPU cost, whereas broker-side caching only addresses the disk-write and commit-latency axes.
Compose with client-side tuning¶
Write caching and client-side batching are additive, not exclusive. If both are feasible:
- Start with client-side tuning (reduces network bandwidth + producer CPU + broker CPU).
- Enable write caching for the residual small-batch workloads that can't be tuned (reduces disk-write amplification + commit latency for those).
Seen in¶
- sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda
— extends the pattern with the hardware-shortfall trigger
(non-NVMe storage: SSD, spinning disk, SAN, remote storage).
Where the 2024-11-26 Kinley post framed write caching as an
organisational substitute (producer team not tunable), this
post frames it as a hardware substitute — the same primitive
solves two distinct classes of problem. Reaffirms
acks=all+ multi-AZ as mandatory companions. - sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2 — canonical wiki source. Names the three client-side-untunable cases; describes the ack-on-memory + background-flush mechanism; frames durability equivalence with legacy Kafka; storage-controller-analogy pedagogy.
Related¶
- concepts/broker-write-caching — the concept page.
- concepts/effective-batch-size — producer-side axis this pattern cannot fix (network cost stays high).
- concepts/small-batch-nvme-write-amplification — what write caching reclaims at the disk layer.
- concepts/acks-producer-durability — producer-side durability control; composes with write caching (ack waits for memory-write quorum, not disk).
- concepts/durability-vs-consistency-guarantee — generic framing of the trade-off.
- patterns/iterative-linger-tuning-production-case — the producer-side alternative; prefer where feasible.
- systems/redpanda, systems/kafka.