Skip to content

CONCEPT Cited by 1 source

Broker-side effective-batch-size observability

Definition

Broker-side effective-batch-size observability is the practice of measuring the effective batch size arriving at a streaming broker (as opposed to the producer's configured ceiling) by dividing a byte-rate metric by a batch-count metric emitted from the broker's own telemetry surface. The ratio, computed from Prometheus counters, is the canonical operations-team answer to "is our batching actually working?"

For Redpanda, the 2024-11-26 batch-tuning explainer names the four metrics plus one public sibling verbatim (Source: sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2):

  • vectorized_storage_log_written_bytes — private; bytes written since process start.
  • vectorized_storage_log_batches_written — private; batches written since process start.
  • vectorized_scheduler_queue_length — private; broker's internal backlog of tasks.
  • redpanda_cpu_busy_seconds_total — public; CPU utilisation.

The ratio log_written_bytes / log_batches_written is the average effective batch size at the broker.

Canonical PromQL templates

Per-topic effective batch size (bytes / batch):

sum(irate(vectorized_storage_log_written_bytes{topic!~"^_.*"}[5m])) by (topic)
  /
sum(irate(vectorized_storage_log_batches_written{topic!~"^_.*"}[5m])) by (topic)

Per-core batch write rate (batches / sec / core):

sum(irate(vectorized_storage_log_batches_written{topic!~"^_.*"}[5m])) by (cluster)
  /
count(redpanda_cpu_busy_seconds_total{}) by (cluster)

Scheduler backlog:

sum(vectorized_scheduler_queue_length{}) by (cluster, group)

CPU utilisation per pod/shard:

avg(deriv(redpanda_cpu_busy_seconds_total{}[5m])) by (pod, shard)

Why broker-side beats producer-side observability

The producer's configured batch.size and linger.ms are ceilings, not descriptions. The producer's own metrics (e.g. kafka.producer.record-send-rate) can show sent-record counts but not how records aggregated into batches on the broker side. The broker is the only vantage point where bytes / batches reflects the effect of the full seven-factor effective-batch-size pipeline (message rate, partitioning, producer fan-out, buffer memory, backpressure, etc.).

Why topic!~"^_.*" filter

Internal Kafka/Redpanda topics (_schemas, __consumer_offsets, __transaction_state) have different byte/batch profiles than application traffic — typically very small records with strict ordering requirements. Including them in a bytes / batches average pulls the number down and hides application-topic tuning signal.

Seen in

Last updated · 470 distilled / 1,213 read