CONCEPT Cited by 1 source
Broker-side effective-batch-size observability¶
Definition¶
Broker-side effective-batch-size observability is the practice of measuring the effective batch size arriving at a streaming broker (as opposed to the producer's configured ceiling) by dividing a byte-rate metric by a batch-count metric emitted from the broker's own telemetry surface. The ratio, computed from Prometheus counters, is the canonical operations-team answer to "is our batching actually working?"
For Redpanda, the 2024-11-26 batch-tuning explainer names the four metrics plus one public sibling verbatim (Source: sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2):
vectorized_storage_log_written_bytes— private; bytes written since process start.vectorized_storage_log_batches_written— private; batches written since process start.vectorized_scheduler_queue_length— private; broker's internal backlog of tasks.redpanda_cpu_busy_seconds_total— public; CPU utilisation.
The ratio log_written_bytes / log_batches_written is the
average effective batch size at the broker.
Canonical PromQL templates¶
Per-topic effective batch size (bytes / batch):
sum(irate(vectorized_storage_log_written_bytes{topic!~"^_.*"}[5m])) by (topic)
/
sum(irate(vectorized_storage_log_batches_written{topic!~"^_.*"}[5m])) by (topic)
Per-core batch write rate (batches / sec / core):
sum(irate(vectorized_storage_log_batches_written{topic!~"^_.*"}[5m])) by (cluster)
/
count(redpanda_cpu_busy_seconds_total{}) by (cluster)
Scheduler backlog:
CPU utilisation per pod/shard:
Why broker-side beats producer-side observability¶
The producer's configured batch.size and linger.ms are
ceilings, not descriptions. The producer's own metrics (e.g.
kafka.producer.record-send-rate) can show sent-record counts
but not how records aggregated into batches on the broker side.
The broker is the only vantage point where bytes / batches
reflects the effect of the full seven-factor
effective-batch-size pipeline
(message rate, partitioning, producer fan-out, buffer memory,
backpressure, etc.).
Why topic!~"^_.*" filter¶
Internal Kafka/Redpanda topics (_schemas,
__consumer_offsets, __transaction_state) have different
byte/batch profiles than application traffic — typically very
small records with strict ordering requirements. Including them
in a bytes / batches average pulls the number down and hides
application-topic tuning signal.
Seen in¶
- sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2 — canonical wiki source. Names the four private + one public Prometheus metric + the five PromQL one-liners + the production case study that validates them.
Related¶
- concepts/effective-batch-size — what this measures.
- concepts/per-topic-batch-diagnosis — why the
by (topic)disaggregation is load-bearing. - concepts/batching-latency-tradeoff — normal-vs-saturated regime framing; scheduler queue length confirms regime.
- patterns/prometheus-effective-batch-size-dashboard — dashboard-shape canonicalisation.
- systems/prometheus, systems/grafana, systems/redpanda.