SYSTEM Cited by 1 source
vmagent (VictoriaMetrics)¶
vmagent is the lightweight metrics agent from the VictoriaMetrics project. It scrapes and/or receives Prometheus-format metrics, applies transformations (including streaming aggregation — see concepts/streaming-aggregation), and forwards them to one or more remote-write endpoints. Small codebase (~10K LOC), designed to be understandable and forkable.
Why orgs pick it for an aggregation tier¶
- Native streaming aggregation for Prometheus metrics (sum / rate / histogram merging) — no need to store raw series first.
- Sharding support → horizontal scale.
- Good docs; small, readable codebase; easy to patch.
- Runs as a regular process / Pod; no exotic dependencies.
(Source: sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline)
Airbnb's two-tier deployment¶
- Routers (stateless): receive incoming OTLP/Prometheus samples and
consistent-hash on all labels except the ones being aggregated
away (e.g.
pod,host). This pins every sample of the same post-aggregation identity to a single aggregator shard. - Aggregators (stateful): deployed as a Kubernetes StatefulSet (stable network identity). Maintain in-memory running totals per output series; flush on a fixed interval.
- Service discovery: routers take a static list of aggregator hostnames on the command line, leveraging StatefulSet DNS. Avoids an extra discovery dependency and keeps sharding deterministic.
- Scale at Airbnb: single prod cluster with hundreds of aggregators, 100M+ samples/sec ingest. (Source: sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline)
Customizations Airbnb made¶
- Native histogram support.
- Mimir-style multitenancy.
- Generic fixes contributed upstream: VictoriaMetrics PRs #5931, #5938, #5990.
- Zero injection for sparse counters (see patterns/zero-injection-counter).
Side benefits of a centralized aggregation tier¶
Once all metrics flow through a single sharded tier, it becomes a natural metric-level control point:
- Drop problematic metrics on the fly when a bad instrumentation change ships.
- Temporarily dual-emit raw (pre-aggregation) metrics for debugging.
- Systematic fixes for semantic gotchas (zero injection for counters, etc.) live here instead of leaking into user queries.
Seen in¶
- sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline — Airbnb replaced a Veneur fork with a sharded vmagent tier scaling to 100M+ samples/sec, and added zero injection for sparse counters.