Skip to content

CONCEPT Cited by 1 source

Streaming aggregation

Streaming aggregation is the pattern of aggregating metrics in transit — as samples flow from producers to storage — instead of querying raw samples and rolling them up at read time. The aggregator keeps running totals per output identity in memory and flushes them on a fixed interval; raw (pre-aggregation) samples are discarded or sampled, not stored.

Why it matters

Storage-bound metrics systems (Prometheus, VictoriaMetrics, vendor TSDBs) typically charge or cost in proportion to ingested samples and active series. Dropping per-instance labels (pod, host, replica) before storage can collapse cardinality by 1–3 orders of magnitude with no user impact, provided the users didn't actually need per-instance slicing.

Alternatives evaluated by Airbnb and why they didn't fit:

  • Prometheus recording rules: run post-storage; require the raw high-cardinality data to be ingested first, which defeats the cost goal.
  • Query-time aggregation: fine for ad hoc exploration, terrible for hot dashboards and alerts — same query runs thousands of times/day.
  • Agent-local aggregation (e.g., in the OTel Collector per pod): can't aggregate across pods; doesn't collapse the per-instance dimension you most want to drop.

(Source: sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline)

Sharding requirement

For streaming aggregation to scale beyond a single aggregator, samples that share a post-aggregation identity must land on the same shard. This is done by consistent-hashing on all labels except the ones being aggregated away — so every sample that will collapse into a given output series goes to exactly one aggregator and its running total is correct. See systems/vmagent for Airbnb's router + aggregator split implementing this.

State and failure model

The aggregator is stateful: in-memory running totals exist only on its pod. That drives several concerns:

  • Stable addressing. Routers need sticky routing to the same shard; Airbnb uses a Kubernetes StatefulSet with a static hostname list on the router CLI (no service-discovery dependency).
  • Restart gaps. When an aggregator restarts, in-flight running totals reset. With cumulative output temporality this shows as a counter reset (correctly detected by rate() if there's a zero baseline — see patterns/zero-injection-counter); with delta output temporality it shows as a gap.
  • Flush interval bounds both data freshness and the cost of a restart.

Centralized aggregation as a control point

A centralized streaming-aggregation tier becomes a natural place to do other metric-wide transformations:

  • Drop metrics from a bad rollout without a code change.
  • Conditionally dual-emit raw samples for debugging.
  • Fix semantic gotchas like sparse-counter undercounting (patterns/zero-injection-counter).

Seen in

Last updated · 200 distilled / 1,178 read