Skip to content

CONCEPT Cited by 1 source

Metric-type metadata

An explicit, authoritative record of each metric's type (counter / gauge / histogram / summary) and unit, stored as metadata rather than inferred from naming conventions. Required whenever metric names are kept stable across systems where type-from-name heuristics no longer apply.

Why it's needed

Prometheus and similar systems traditionally infer types from suffixes (_total → counter, _bucket → histogram component, _seconds → unit). Two things break this:

  1. Legacy names kept for migration continuity. Renaming every metric to match Prometheus conventions would desync application code from telemetry during a multi-year migration (what Airbnb faced).
  2. Heterogeneous origin. Metrics emitted from multiple instrumentation libraries, SDKs, or historical vendor agents don't share a naming convention.

Without ground truth for type, translators and query-authoring tools (AI or human) can't reliably choose between sum / rate / histogram_quantile / increase / etc.

Airbnb's implementation

Airbnb built a metadata engine inside the translation layer that:

  • Periodically scans all known metrics across the platform.
  • Uses an internal label, _otel_metric_type_, as the source of truth for metric type (and related labels for unit).
  • Maintains a reliable metric → type mapping that both the query translator and in-house LLM tooling consume to generate correct PromQL.

(Source: sources/2026-03-17-airbnb-observability-ownership-migration)

Consequences

  • Downstream tooling can generate correct, intent-preserving queries even when metric names are legacy/ambiguous.
  • AI assistants generating PromQL produce materially better results when seeded with this metadata vs. raw metric names alone.
  • The metadata engine becomes a central dependency — its freshness and correctness directly affect alert and dashboard reliability.

Seen in

Last updated · 200 distilled / 1,178 read