Skip to content

CONCEPT Cited by 5 sources

Compression codec trade-off

Definition

Compression codec trade-off is the choice a streaming producer makes between space / bandwidth savings (high compression ratio = fewer bytes over the wire, less disk, cheaper cross-region transfer) and CPU time (compression and decompression are CPU-bound operations). No single codec dominates both axes; the operator picks a point on the curve.

Redpanda's verbatim guidance (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):

"There are many choices of compression codecs. Some will compress extremely well, but also require a significant amount of CPU time and memory. Others will compress more moderately, but use far fewer resources. A classic tradeoff."

Why compression matters

Kafka's data path involves many byte transfers:

"Producers spend their days sending data, which Redpanda dutifully writes to NVMe devices and sends it over the network to other brokers to do the same. Consumers then send requests for data (via the network), so Redpanda retrieves it (from memory or NVMe) and sends it back over the network. Finally, consumers send in their commits. That's a lot of data transfers."

Each medium (network, disk) has fixed capacity. Compression is the lever: "If you can compress messages at a ratio of 5:1, you can reduce what you would have sent by 80%, which helps every stage of the data lifecycle (ingestion, storage, and retrieval)."

The codecs

The four canonical Kafka / Redpanda producer codecs:

Codec Compression ratio CPU cost Memory cost Typical use
gzip High High High Legacy; archives where CPU is cheap
snappy Low-medium Low Low Legacy; fast but weak compression
LZ4 Medium Low Low General-purpose default; good balance
ZSTD High Medium Medium Modern default when ratio matters

The post's bottom-line recommendation:

"Use ZSTD or LZ4 for a good balance between compression ratio and CPU time if compression is essential."

ZSTD (Zstandard, Facebook 2016) is the modern sweet spot — gzip-class ratios at a fraction of the CPU.

LZ4 (2011) is the speed-first choice — lower ratio than ZSTD but orders-of-magnitude lower CPU, particularly on decompression. Preferred for CPU-constrained consumers or compaction-heavy topics (see concepts/compression-compaction-cpu-cost).

gzip and snappy are legacy defaults; no reason to prefer them over ZSTD / LZ4 for new workloads.

Compression composes with batching

The ratio is a function of batch size — bigger batches compress better because there are more opportunities for dictionary reuse across records. From Kinley 2024-11-19 part 1:

"The compression ratio improves as you compress more messages at once since it can take advantage of the similarities between messages."

Implications:

  • Small batches (< 4 KB) compress poorly — there's no dictionary to reuse.
  • Larger batches compress asymptotically better, with diminishing returns past ~100 KB for most payloads.
  • Any decision to adopt compression should be co-tuned with effective batch size. A well-batched LZ4 workload outperforms a tiny-batch ZSTD one.

This interacts with Redpanda's sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2|2024-11-26 part 2 production case study: after linger-tuning, the customer's bandwidth dropped from 1.1 GB/sec to 575 MB/sec for the same 1.2 M msg/sec flow — compression was a major contributor alongside Kafka-metadata-overhead reduction. Larger batches → better compression ratios.

Compress on the client, not the broker

Two places compression could happen:

  1. Client-side: producer compresses batches; consumer decompresses. Broker treats batches as opaque bytes.
  2. Broker-side: client sends uncompressed; broker compresses before writing; decompresses on consumer read.

Redpanda's rule: client-side, always. Verbatim:

"Compress on the client, not the broker (topic configuration for compression should be set to producer)."

Setting compression.type=producer on the topic means the broker accepts whatever codec the client chose and passes it through unchanged — no broker CPU spent. Canonicalised as patterns/client-side-compression-over-broker-compression.

Clients compress batches, not individual messages: "Clients compress batches, not messages, therefore increasing batching will also make compression more effective."

When compression is wrong

  • Already-compressed payloads — JPEG images, MP4 video, gzipped logs all compress poorly and waste CPU. Double-compression adds overhead without savings.
  • Tiny batches — batches below ~1 KB can't accumulate enough repetition for the codec to exploit. Ratio approaches 1.0 (no compression) or even > 1.0 (overhead exceeds savings).
  • Compacted topics with ZSTD — every compaction pass must decompress + recompress. See concepts/compression-compaction-cpu-cost. If compaction + compression are both required, prefer LZ4.
  • Extreme-latency-sensitive paths — compression adds per-batch CPU; where every µs matters (financial HFT), uncompressed may win.

Seen in

Last updated · 470 distilled / 1,213 read