CONCEPT Cited by 1 source
Consumer fetch tuning¶
Definition¶
Consumer fetch tuning is the Kafka / Redpanda client-side practice of explicitly configuring the four fetch parameters that govern how much data a consumer pulls per broker fetch request. The defaults are a midpoint compromise; most consumers skew toward either low latency (small fetches, fast returns) or high throughput (big fetches, amortised per-request overhead), and leaving the defaults in place typically underserves both regimes.
Redpanda's verbatim framing (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):
"Most consumers will have a preference for either low latency or high throughput, and explicitly tuning the configuration towards that preference can have a huge impact on performance."
The four parameters¶
| Parameter | Role |
|---|---|
fetch.min.bytes |
Minimum bytes the broker must accumulate before returning the fetch response |
fetch.max.wait.ms |
Maximum time the broker waits for fetch.min.bytes to accumulate |
max.partition.fetch.bytes |
Maximum bytes a single fetch can return per partition |
max.poll.records |
Maximum records a single poll() returns to application code |
The first two are the broker-side batch-formation dial —
analogous to the producer's linger.ms / batch.size pair. The
broker waits up to fetch.max.wait.ms for fetch.min.bytes to
accumulate before returning. The last two are the client-side
return-size dial — caps on per-partition and per-poll record
counts.
Redpanda's recommended operating points¶
The post's verbatim table:
| Parameter | Low Latency | Default | High Throughput |
|---|---|---|---|
fetch.min.bytes |
1 B | 1 B | 1 MB+ |
fetch.max.wait.ms |
< 50 ms | 500 ms | > 1000 ms |
max.partition.fetch.bytes |
< 100 KB | 1 MB | > 10 MB |
max.poll.records |
< 100 | 500 | > 5000 |
Interpretation:
- Low-latency regime: the consumer wants new records to reach
the application as fast as possible. Small
fetch.min.bytes(1 B ≈ "return anything as soon as it exists") + shortfetch.max.wait.ms(< 50 ms) ensures the broker doesn't hold the fetch waiting. Smallmax.partition.fetch.bytes/max.poll.recordsmeans each poll returns a small bounded chunk, so the application processes records promptly rather than in large batches. - High-throughput regime: the consumer wants to minimise the
per-fetch overhead of talking to the broker. 1 MB+ of
fetch.min.bytes+ > 1 sfetch.max.wait.msensures every fetch returns a full-sized chunk. Largemax.partition.fetch.bytes(>10 MB) andmax.poll.records(>5000) let the consumer process a large batch per poll — amortising the CPU cost of the poll-loop across many records.
Dual to the producer batching trade-off¶
The consumer-side tuning axis is the mirror image of the producer-side batching trade-off:
| Producer | Consumer | |
|---|---|---|
| Latency dial | linger.ms |
fetch.max.wait.ms |
| Size dial | batch.size |
fetch.min.bytes, max.partition.fetch.bytes |
| Trigger logic | dispatch on size OR time | return fetch on size OR time |
| Underlying economics | amortise fixed per-request cost | amortise fixed per-fetch cost |
Both sides exploit the same fixed-vs-variable request-cost substrate — every broker RPC has a fixed cost (connection handling, request parse, response frame) and a variable cost (bytes read/written). Big fetches amortise the fixed cost; small fetches don't.
Multi-threading on the client¶
The post adds a secondary recommendation:
"Consider multi-threading on the client application."
A consumer that processes records synchronously per poll caps
throughput at records-per-poll × 1 / processing-latency.
Multi-threading (process N records in parallel) uncouples the
fetch cadence from the processing cadence — useful when
per-record processing is CPU-heavy.
Caveat: multi-threading complicates offset commits — a commit at position X implies all records ≤ X have been processed, so the application must either (a) commit only after all parallel workers for a batch finish, or (b) use async commit + idempotent processing to tolerate commit-position overshoot on crash.
Caveats¶
- The table is qualitative. The low-latency / default /
high-throughput columns give order-of-magnitude ranges
(
<100 KB/1 MB/>10 MB) without workload-specific benchmark validation. Operators should measure their own workloads against the defaults before committing to a tuned set. max.partition.fetch.bytesinteracts with topic-levelmax.message.bytes. If individual records are larger thanmax.partition.fetch.bytes, the fetch can hang — this parameter must be at least as large as the largest expected record.- Consumer memory footprint scales with
max.partition.fetch.bytes × assigned-partition-count. A consumer assigned 100 partitions with 10 MB per partition is a 1 GB buffer. On heap-constrained clients, this matters. fetch.min.bytes=1B+fetch.max.wait.ms=0msis "return immediately as soon as any record exists" — the worst case for broker CPU (fetch-per-record) but the lowest achievable latency. Choose this only when per-record latency dominates over broker CPU cost.
Seen in¶
- sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — canonical wiki source. Four-parameter table; low-latency-vs-high-throughput positioning; multi-threading callout.
Related¶
- systems/kafka, systems/redpanda — Kafka-API consumers.
- concepts/consumer-group — the coordination layer on which fetch requests happen.
- concepts/kafka-partition — fetches are per-partition.
- concepts/batching-latency-tradeoff — the dual producer-side trade-off.
- concepts/fixed-vs-variable-request-cost — the substrate economics that make fetch-tuning matter.
- concepts/effective-batch-size — producer-side parallel.
- concepts/kafka-consumer-lag-metric — the signal that drives throughput-regime tuning.
- concepts/offset-commit-cost — the commit-side cost that composes with fetch tuning.