Skip to content

PATTERN Cited by 1 source

Zero injection for sparse counters

A transparent fix for Prometheus' silent undercounting of sparse counters under rate() / increase(). The aggregation tier, on the first flush of each counter series, emits a synthetic zero sample with a delayed timestamp instead of the actual running total; the real first total follows on the next flush. This seeds Prometheus with the zero baseline its rate() implementation implicitly assumes.

The problem (why this is needed)

In Prometheus, a counter data point is a cumulative value from zero. rate() derives change across consecutive samples. Creation and the first increment happen in the same call, so if a counter is reset (pod restart, aggregator restart, scale-down) before a second increment, rate() sees one sample and has no delta to compute — the first increment is silently lost.

1 1 1 1 1 - - - 1
          ↑ reset — delta to first "1" after reset is also lost

In StatsD, every flush is a delta in its own right, so the first increment is never lost.

Airbnb found this wasn't rare: their workloads generate many high-cardinality, low-rate counters — e.g. requests broken down by currency × user × region — where individual series increment only a handful of times per day. The edge case is the common case. It blocked migration progress until solved. (Source: sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline)

The solution

Inside the aggregator (see systems/vmagent):

  1. On the first flush of a given output counter series, emit a zero (with a timestamp slightly before what would otherwise be the first real sample, to avoid collision).
  2. On all subsequent flushes, emit the real running total.

Now rate() always has a zero baseline to diff against, so the very first real increment is captured correctly.

Trade-offs

  • One-flush-interval lag on the first visible increment. In practice this is irrelevant.
  • Small extra sample per counter series on creation.
  • Fix lives in the aggregation tier, not per callsite → invisible to app teams, no PromQL hacks, no gauge-instead-of-counter lies.

Why this won over alternatives

Rejected option Why it failed
Pre-initialize all counters to zero at app startup Can't enumerate label combinations ahead of time.
Tell teams to use logs for exact counts Different system, different latency, breaks alerting UX.
Emit gauges instead of counters Against Prometheus conventions; gauges/counters are the same internally but semantic expectations differ.
Pad queries with PromQL hacks Pushes complexity onto every dashboard/alert owner.

The centralized streaming-aggregation tier (concepts/streaming-aggregation) was the right single place to fix the semantic gap, illustrating the general principle: solve backend quirks in the pipeline, not in users' queries.

Seen in

Last updated · 200 distilled / 1,178 read