SYSTEM Cited by 1 source
Alertmanager¶
What it is¶
Alertmanager is the CNCF-hosted alert-routing and -deduplication companion to Prometheus. Prometheus servers evaluate alerting rules and push firing alerts to Alertmanager, which handles:
- Grouping — collapsing many related alerts into one notification.
- Inhibition — suppressing alerts that are redundant given other active alerts.
- Silencing — time-bounded mute for expected maintenance.
- Routing — per-alert-label dispatch to specific notification channels (PagerDuty, Opsgenie, email, Slack, webhooks, SNS, etc.).
- High availability — multiple Alertmanagers can form a gossip cluster that shares notification state so they don't double-send.
Stub — expand on future ingests when post-mortems surface Alertmanager-internal details.
Role for this wiki¶
Alertmanager is the last-mile delivery tier in the Prometheus alerting path. Most Airbnb-/Grafana-/PlanetScale- /Redpanda-style observability stacks pair it with Prometheus by default. On the wiki, Alertmanager shows up in two distinct roles:
- Delivery of production alerts — the default use; alerts fire on SLO breach and page on-call via PagerDuty / equivalent.
- Delivery of heartbeats for Dead Man's Switch — an always-firing alert is routed via Alertmanager to an external channel (e.g., AWS SNS) where the absence of messages triggers a CloudWatch rate alarm. Canonical instance: sources/2026-05-05-airbnb-monitoring-reliably-at-scale.
HA topology¶
Alertmanagers are typically deployed in HA sets of 2–3 replicas that gossip notification state to prevent duplicate pages. For meta-monitoring, the HA topology must extend to anti-affinity across nodes, AZs, and — specifically at Airbnb — anti-affinity between a Prometheus-Alertmanager pair as a unit so the pair doesn't fail together on a single-cluster incident. See patterns/ha-set-anti-affinity-across-shared-infra.
Seen in¶
- sources/2026-05-05-airbnb-monitoring-reliably-at-scale — canonical wiki instance. Airbnb's meta-monitoring layer is explicitly "Prometheus + Alertmanager HA pair" with pair-level anti-affinity. Alertmanager carries the always-firing DMS heartbeat to an external AWS SNS topic; CloudWatch watches the rate and pages when the heartbeat stops. This is the first canonical wiki instance of Alertmanager in its dead-man's-switch delivery role — a different shape from its typical production-alert delivery role.
Related¶
- systems/prometheus — Alertmanager's upstream source of firing alerts.
- systems/aws-sns — external destination for Airbnb's DMS heartbeat.
- systems/aws-cloudwatch — rate-watchdog at the DMS's external end.
- concepts/meta-monitoring — the parent discipline.
- concepts/dead-mans-switch — the primitive Alertmanager participates in delivering.
- concepts/observability
- patterns/heartbeat-absence-as-alert-trigger — the pattern Alertmanager helps realise.
- patterns/ha-set-anti-affinity-across-shared-infra — the anti-affinity discipline for HA Alertmanager pairs.