PATTERN Cited by 1 source
Iterative linger-tuning production case¶
Problem¶
A Kafka-API streaming cluster is CPU-saturated under heavy produce
load. Operators suspect producer-side linger.ms is too low —
producer batches are closing early (before batch.size fills),
flooding the broker with tiny requests. But which linger.ms
value is correct? How much latency can be reclaimed? Does the
cluster need more hardware or less configuration friction?
Classic single-step guessing fails because:
- Latency can temporarily rise when
linger.msis raised in the normal regime — an operator who reverts on the first signal of higher average latency misses the saturation-regime inversion. - Percentile-by-percentile behaviour under tuning is not monotonic — p50 can improve smoothly while p99.999 lags several rounds of tuning.
Solution¶
Iteratively adjust linger.ms in multiple rounds, each round
evaluated against a quantitative percentile-by-percentile latency
table and the
Prometheus effective-batch-size dashboard.
Canonicalised from Redpanda's 2024 customer
case study (Source:
sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2).
Round structure¶
Each round:
- Pick the next
linger.msvalue. Move gradually (e.g. 2×, 5×, 10× the current value), not in one large jump — the saturation regime is entered / exited gradually. - Apply to producers across the fleet. If application-team owned, coordinate a deploy; if broker-owned, hot-reload.
- Wait for steady state — several minutes of production traffic at the new setting.
- Record percentile table: p50, p85, p95, p99, p99.999 at minimum.
- Record dashboard state: per-topic effective batch size, scheduler backlog, CPU utilisation, batch-rate-per-core.
- Decide: continue tuning if latency is still decreasing; stop when diminishing returns (or cluster CPU is healthily below saturation).
The Redpanda post applied three rounds over "several days" on a real Tier-7 BYOC cluster. Results table verbatim:
| Percentile | Original | Change 1 | Change 2 | Change 3 |
|---|---|---|---|---|
| p50 | 25 ms | 15 ms | 4 ms | < 1 ms |
| p85 | 55 ms | 32 ms | 17 ms | 3 ms |
| p95 | 90 ms | 57 ms | 32 ms | 6 ms |
| p99 | 128 ms | 100 ms | 63 ms | 17 ms |
| p99.999 | 490 ms | 260 ms | 240 ms | 130 ms |
Non-obvious outcomes¶
Every percentile improved at every round. No regression at
any level — evidence that the cluster was deep in the saturated
regime throughout the tuning. In the normal regime, raising
linger.ms would have hurt p50 while helping tail — the
monotonic improvement across all percentiles is the signature of
saturation-regime tuning.
The tail improves more slowly than the median. p50 dropped 25× across three rounds (25 ms → <1 ms); p99.999 dropped 3.8× (490 → 130 ms). The saturation-regime latency inversion primarily reclaims the median and low-high percentiles; extreme tail improvements require cumulative rounds.
Network bandwidth dropped 48% at identical message rate (1.1 GB/sec → 575 MB/sec for 1.2 M msg/sec). Attributed to better compression and reduced Kafka-metadata overhead per batch. Not a pre-stated goal but a clean second-order gain.
Cluster consolidation became possible. Post-tuning CPU dropped to ~50% on one cluster — the two-cluster deployment was consolidated to one handling 2.5–2.7 M msg/sec (the pre-tuning 1.2 M × 2 clusters → post-tuning 2.6 M × 1 cluster, ~2.2× throughput per cluster).
Structure by stage¶
Round 0 (baseline): measure.
Round 1: linger.ms × ~2. Measure percentile table + dashboard.
Expect: 30–50% latency improvement across percentiles,
effective batch size rising, scheduler backlog falling.
Round 2: linger.ms × ~5 vs. baseline. Measure.
Expect: further 2–3× latency improvement on median,
less on tail.
Round 3: linger.ms × ~10 vs. baseline, or topic-targeted fine-tune.
Expect: diminishing returns — if effective batch size is
now > 16 KB, stop.
Validation: cluster CPU well below saturation, scheduler backlog
near zero, per-topic effective batch size above 4 KB
for every topic.
Why multi-round¶
One-shot tuning can't answer the question "is this the right value?" — there's no counterfactual. Three rounds generates a progression curve that shows whether diminishing returns have kicked in, which percentile bands are still responsive, and whether the cluster has exited the saturated regime.
The three-round structure is also rollback-safe: if round 3 regresses, round 2's settings are a known-good fallback.
Prerequisites¶
- Prometheus effective-batch-size dashboard operational before round 1.
- Per-topic tracking: concepts/per-topic-batch-diagnosis discipline — if one topic has pathologically low batches, target the tune there, not cluster-wide.
- Coordinated deploy window when producers are application-team-owned.
- Latency measurement infrastructure that can hold percentile-table observations across rounds.
Consequences¶
Positive:
- Order-of-magnitude tail-latency improvements possible (p99 7.5×, p50 25× in the case study).
- Network-bandwidth reduction at identical message rate.
- Cluster consolidation enabled by CPU headroom.
Negative / risks:
- Producer-side per-record latency rises in the normal regime — if the cluster leaves the saturated regime mid-tuning, subsequent rounds can start hurting instead of helping.
- Producer memory pressure rises with bigger
linger.ms+ unchangedbuffer.memory— a third-dimension tuning surface. - Tuning is multi-team when producers span services.
Seen in¶
- sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2 — canonical wiki source. 2024 Redpanda Cloud BYOC customer migration; three-round linger tuning; full percentile table with p50 / p85 / p95 / p99 / p99.999 deltas; 1.1 → 0.575 GB/sec network reduction; 2-cluster → 1-cluster consolidation.
Related¶
- concepts/batching-latency-tradeoff — normal-vs-saturated regime framing. This pattern is the saturated-regime execution playbook.
- concepts/effective-batch-size — target state: every topic above 4 KB.
- concepts/per-topic-batch-diagnosis — targeting discipline.
- concepts/tail-latency-at-scale — what this pattern reclaims.
- concepts/cpu-utilization-vs-saturation — the regime detector.
- patterns/prometheus-effective-batch-size-dashboard — instrumentation prerequisite.
- systems/redpanda, systems/kafka.