Skip to content

SYSTEM Cited by 1 source

vmagent (VictoriaMetrics)

vmagent is the lightweight metrics agent from the VictoriaMetrics project. It scrapes and/or receives Prometheus-format metrics, applies transformations (including streaming aggregation — see concepts/streaming-aggregation), and forwards them to one or more remote-write endpoints. Small codebase (~10K LOC), designed to be understandable and forkable.

Why orgs pick it for an aggregation tier

  • Native streaming aggregation for Prometheus metrics (sum / rate / histogram merging) — no need to store raw series first.
  • Sharding support → horizontal scale.
  • Good docs; small, readable codebase; easy to patch.
  • Runs as a regular process / Pod; no exotic dependencies.

(Source: sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline)

Airbnb's two-tier deployment

  • Routers (stateless): receive incoming OTLP/Prometheus samples and consistent-hash on all labels except the ones being aggregated away (e.g. pod, host). This pins every sample of the same post-aggregation identity to a single aggregator shard.
  • Aggregators (stateful): deployed as a Kubernetes StatefulSet (stable network identity). Maintain in-memory running totals per output series; flush on a fixed interval.
  • Service discovery: routers take a static list of aggregator hostnames on the command line, leveraging StatefulSet DNS. Avoids an extra discovery dependency and keeps sharding deterministic.
  • Scale at Airbnb: single prod cluster with hundreds of aggregators, 100M+ samples/sec ingest. (Source: sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline)

Customizations Airbnb made

Side benefits of a centralized aggregation tier

Once all metrics flow through a single sharded tier, it becomes a natural metric-level control point:

  • Drop problematic metrics on the fly when a bad instrumentation change ships.
  • Temporarily dual-emit raw (pre-aggregation) metrics for debugging.
  • Systematic fixes for semantic gotchas (zero injection for counters, etc.) live here instead of leaking into user queries.

Seen in

Last updated · 200 distilled / 1,178 read