CONCEPT Cited by 1 source
Offset-commit cost¶
Definition¶
Offset-commit cost names the often-overlooked broker-side tax
that consumer groups pay every time they commit their progress.
A consumer group's committed offset per partition is stored
in the internal __consumer_offsets topic; each commit is
therefore a produce write to that topic, handled by the
group coordinator broker.
A consumer reading at rate R and committing every C seconds
adds R-agnostic commit-write pressure at rate partitions / C
writes per consumer group. Across many consumer groups with
frequent commits, the aggregate load on __consumer_offsets and
the group coordinators can rival application-topic traffic.
Redpanda's save-button analogy (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):
"When consuming a topic, committing your consumer group offsets is exactly like pressing the save button. You record where you've read to, and just like that save, each commit takes time and resources. If you commit too frequently, your consumer will be less efficient, but it can also start to impact the broker as your consume workload gradually transforms (somewhat unknowingly) into a consume AND produce workload, since each read is accompanied by a commit write."
The read-becomes-write transformation¶
The mechanical chain:
consumer.poll()
→ broker returns batch of records
→ application processes records
→ consumer.commit()
→ consumer sends commit RPC to group coordinator
→ coordinator produces offset record to __consumer_offsets
→ coordinator acks commit to consumer
Every "consume event" implicitly triggers a "produce event" — the consumer load is actually consume + produce, weighted by commit frequency. For a consumer committing every 100 ms, the broker sees 10 commit writes per second per partition assigned to that consumer.
The knobs¶
enable.auto.commit (default true)¶
If true, the consumer commits in the background every
auto.commit.interval.ms. The default auto-commit interval is
5 seconds — usually a reasonable starting point. The post's
advice:
"If using auto commit, set
auto.commit.interval.msto a reasonable value. Generally, one second or higher; the default is 5 seconds. Low milliseconds is right out!"
Low-ms auto-commit is a common anti-pattern; developers reach for it thinking they're minimising re-read on restart. They are — but the broker tax is severe.
Manual commit¶
If enable.auto.commit=false, the consumer calls
consumer.commitSync() or consumer.commitAsync() explicitly.
The discipline:
"If manually committing in your application code, try to align your implied commit frequency to at least one second."
One consumer group per application¶
Shared consumer groups across services create group-coordinator contention and rebalance storms. Verbatim:
"Make sure each application or micro-service uses its own consumer group, otherwise your applications can inadvertently increase the load on a single group coordinator and make rebalances more costly and impactful."
Commit frequency = RPO dial¶
The consumer-side connection to RPO (recovery point objective):
On consumer restart, a consumer re-reads from its last committed
offset. If the consumer was committing every C seconds and
crashes, up to C seconds of records must be re-read. Commit
frequency is the RPO for the consume side:
| Commit interval | Worst-case re-read on crash |
|---|---|
| 100 ms | 100 ms |
| 1 s | 1 s |
| 5 s (default) | 5 s |
| 60 s | 60 s |
Redpanda's verbatim framing:
"In a Disaster Recovery (DR) context, be aware of your Recovery Point Objectives (RPOs) and use those to help define your minimum commit frequency."
The corollary: if the application tolerates 10 s of re-read on
crash (idempotent processing, deduplicated downstream),
auto.commit.interval.ms=10000 is safe — and dramatically
cheaper than the default 5 s.
Why "few seconds of re-read" is usually fine¶
The post's load-bearing argument against over-committing:
"Many folks try to commit excessively often to minimize re-reads during an application restart. While that initially sounds plausible, re-reading some amount of data occasionally is expected for most streaming applications, so if your application already has to cope with re-reading a few milliseconds of messages, it can probably cope with a few seconds worth."
Most streaming applications already tolerate some re-read (at-least-once delivery semantics, producer retries after broker timeouts, consumer rebalances). Extending that re-read window from a few ms to a few s is usually free — and buys an order-of-magnitude reduction in commit rate.
Quantifying the broker impact¶
For a consumer group with P assigned partitions committing
every C seconds, the commit rate is P/C writes per second
per group.
A hundred consumer groups each with 32 partitions committing
every 100 ms:
- Commit rate: 100 × 32 / 0.1 = 32,000 writes/sec to
__consumer_offsets.
Same workload committing every 5 s: - Commit rate: 100 × 32 / 5 = 640 writes/sec.
50× reduction in commit load for a 50× longer potential re-read window. For most workloads, that's a trivially worthwhile trade.
Seen in¶
- sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda
— canonical wiki source. Save-button analogy; read-becomes-
write framing;
auto.commit.interval.ms ≥ 1 srule; RPO-as-commit-frequency link; one-consumer-group-per-app rule.
Related¶
- systems/kafka, systems/redpanda — Kafka-API consumers.
- concepts/consumer-group — the coordination boundary.
- concepts/kafka-partition — commit granularity is per-partition.
- concepts/kafka-consumer-lag-metric — the signal that ensures commits aren't being dropped.
- concepts/consumer-fetch-tuning — the fetch-side tuning that composes with commit tuning.
- concepts/rpo-rto — the durability framing that sets the minimum commit frequency.
- concepts/fixed-vs-variable-request-cost — the substrate economics; each commit is a fixed-cost write.