Skip to content

SYSTEM Cited by 1 source

Alertmanager

What it is

Alertmanager is the CNCF-hosted alert-routing and -deduplication companion to Prometheus. Prometheus servers evaluate alerting rules and push firing alerts to Alertmanager, which handles:

  • Grouping — collapsing many related alerts into one notification.
  • Inhibition — suppressing alerts that are redundant given other active alerts.
  • Silencing — time-bounded mute for expected maintenance.
  • Routing — per-alert-label dispatch to specific notification channels (PagerDuty, Opsgenie, email, Slack, webhooks, SNS, etc.).
  • High availability — multiple Alertmanagers can form a gossip cluster that shares notification state so they don't double-send.

Stub — expand on future ingests when post-mortems surface Alertmanager-internal details.

Role for this wiki

Alertmanager is the last-mile delivery tier in the Prometheus alerting path. Most Airbnb-/Grafana-/PlanetScale- /Redpanda-style observability stacks pair it with Prometheus by default. On the wiki, Alertmanager shows up in two distinct roles:

  1. Delivery of production alerts — the default use; alerts fire on SLO breach and page on-call via PagerDuty / equivalent.
  2. Delivery of heartbeats for Dead Man's Switch — an always-firing alert is routed via Alertmanager to an external channel (e.g., AWS SNS) where the absence of messages triggers a CloudWatch rate alarm. Canonical instance: sources/2026-05-05-airbnb-monitoring-reliably-at-scale.

HA topology

Alertmanagers are typically deployed in HA sets of 2–3 replicas that gossip notification state to prevent duplicate pages. For meta-monitoring, the HA topology must extend to anti-affinity across nodes, AZs, and — specifically at Airbnb — anti-affinity between a Prometheus-Alertmanager pair as a unit so the pair doesn't fail together on a single-cluster incident. See patterns/ha-set-anti-affinity-across-shared-infra.

Seen in

  • sources/2026-05-05-airbnb-monitoring-reliably-at-scale — canonical wiki instance. Airbnb's meta-monitoring layer is explicitly "Prometheus + Alertmanager HA pair" with pair-level anti-affinity. Alertmanager carries the always-firing DMS heartbeat to an external AWS SNS topic; CloudWatch watches the rate and pages when the heartbeat stops. This is the first canonical wiki instance of Alertmanager in its dead-man's-switch delivery role — a different shape from its typical production-alert delivery role.
Last updated · 451 distilled / 1,324 read