Skip to content

CONCEPT Cited by 1 source

Consumer fetch tuning

Definition

Consumer fetch tuning is the Kafka / Redpanda client-side practice of explicitly configuring the four fetch parameters that govern how much data a consumer pulls per broker fetch request. The defaults are a midpoint compromise; most consumers skew toward either low latency (small fetches, fast returns) or high throughput (big fetches, amortised per-request overhead), and leaving the defaults in place typically underserves both regimes.

Redpanda's verbatim framing (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):

"Most consumers will have a preference for either low latency or high throughput, and explicitly tuning the configuration towards that preference can have a huge impact on performance."

The four parameters

Parameter Role
fetch.min.bytes Minimum bytes the broker must accumulate before returning the fetch response
fetch.max.wait.ms Maximum time the broker waits for fetch.min.bytes to accumulate
max.partition.fetch.bytes Maximum bytes a single fetch can return per partition
max.poll.records Maximum records a single poll() returns to application code

The first two are the broker-side batch-formation dial — analogous to the producer's linger.ms / batch.size pair. The broker waits up to fetch.max.wait.ms for fetch.min.bytes to accumulate before returning. The last two are the client-side return-size dial — caps on per-partition and per-poll record counts.

The post's verbatim table:

Parameter Low Latency Default High Throughput
fetch.min.bytes 1 B 1 B 1 MB+
fetch.max.wait.ms < 50 ms 500 ms > 1000 ms
max.partition.fetch.bytes < 100 KB 1 MB > 10 MB
max.poll.records < 100 500 > 5000

Interpretation:

  • Low-latency regime: the consumer wants new records to reach the application as fast as possible. Small fetch.min.bytes (1 B ≈ "return anything as soon as it exists") + short fetch.max.wait.ms (< 50 ms) ensures the broker doesn't hold the fetch waiting. Small max.partition.fetch.bytes / max.poll.records means each poll returns a small bounded chunk, so the application processes records promptly rather than in large batches.
  • High-throughput regime: the consumer wants to minimise the per-fetch overhead of talking to the broker. 1 MB+ of fetch.min.bytes + > 1 s fetch.max.wait.ms ensures every fetch returns a full-sized chunk. Large max.partition.fetch.bytes (>10 MB) and max.poll.records (>5000) let the consumer process a large batch per poll — amortising the CPU cost of the poll-loop across many records.

Dual to the producer batching trade-off

The consumer-side tuning axis is the mirror image of the producer-side batching trade-off:

Producer Consumer
Latency dial linger.ms fetch.max.wait.ms
Size dial batch.size fetch.min.bytes, max.partition.fetch.bytes
Trigger logic dispatch on size OR time return fetch on size OR time
Underlying economics amortise fixed per-request cost amortise fixed per-fetch cost

Both sides exploit the same fixed-vs-variable request-cost substrate — every broker RPC has a fixed cost (connection handling, request parse, response frame) and a variable cost (bytes read/written). Big fetches amortise the fixed cost; small fetches don't.

Multi-threading on the client

The post adds a secondary recommendation:

"Consider multi-threading on the client application."

A consumer that processes records synchronously per poll caps throughput at records-per-poll × 1 / processing-latency. Multi-threading (process N records in parallel) uncouples the fetch cadence from the processing cadence — useful when per-record processing is CPU-heavy.

Caveat: multi-threading complicates offset commits — a commit at position X implies all records ≤ X have been processed, so the application must either (a) commit only after all parallel workers for a batch finish, or (b) use async commit + idempotent processing to tolerate commit-position overshoot on crash.

Caveats

  • The table is qualitative. The low-latency / default / high-throughput columns give order-of-magnitude ranges (<100 KB / 1 MB / >10 MB) without workload-specific benchmark validation. Operators should measure their own workloads against the defaults before committing to a tuned set.
  • max.partition.fetch.bytes interacts with topic-level max.message.bytes. If individual records are larger than max.partition.fetch.bytes, the fetch can hang — this parameter must be at least as large as the largest expected record.
  • Consumer memory footprint scales with max.partition.fetch.bytes × assigned-partition-count. A consumer assigned 100 partitions with 10 MB per partition is a 1 GB buffer. On heap-constrained clients, this matters.
  • fetch.min.bytes=1B + fetch.max.wait.ms=0ms is "return immediately as soon as any record exists" — the worst case for broker CPU (fetch-per-record) but the lowest achievable latency. Choose this only when per-record latency dominates over broker CPU cost.

Seen in

Last updated · 470 distilled / 1,213 read