Skip to content

CONCEPT Cited by 1 source

Push vs pull monitoring

A monitoring architecture is fundamentally either pull-based (the monitoring server periodically scrapes metrics from endpoints) or push-based (the monitored system pushes metrics into a collector / stream / backend as they're produced). Both shapes ship in production at scale; the choice axis is who drives the data flow, and the trade-offs cluster around three failure modes: API-throttling under fan-out, polling latency as freshness floor, and operational coupling between monitor and target.

This concept is the observability-altitude sibling of concepts/pull-vs-push-streams (which is the JS/streams API altitude trade-off). The same word, different load-bearing surface.

The two models at the monitoring altitude

Axis Pull-based Push-based
Canonical instance Prometheus scraping /metrics CloudWatch Metric Streams, StatsD, OTel push exporter
Who drives Monitor server Monitored system / source
Scrape / emit cadence Configurable interval at scrape side Event-driven at source
Freshness floor Scrape interval (often 15–60s) Source emission cadence
API call shape One scrape per metric per interval One push per metric per emission
Failure mode at scale API throttling, scrape misses Producer-side back-pressure, lost emissions
Service-discovery requirement Mandatory — monitor needs target list Optional — sources self-identify
Network direction Monitor → target Source → collector

Why pull-based monitoring throttles at scale

The 2026-05-13 source canonicalises the production failure mode of pull-based monitoring at scale. The customer ran Prometheus with the AWS CloudWatch exporter — a pull-based shape that scrapes CloudWatch for each configured metric on a fixed interval. At fleet scale, two things go wrong:

"Our customer's current monitoring solution with Prometheus and Amazon CloudWatch exporter using a pull-based approach resulted in higher API throttling. This caused metric loss and created gaps in observability data for business-critical systems. The frequent polling approach in this model also resulted in higher costs from API calls. This polling solution did not satisfy their requirement of sub-minute latency for real-time alerting." (Source)

Three named failure modes:

  1. API throttling — every pull-side scrape is an API call against the upstream metrics provider. CloudWatch's GetMetricData / GetMetricStatistics calls are quota-throttled; at thousands of metrics × dozens of targets × 60s scrape interval the call rate saturates and exporters drop metrics.
  2. API cost amplification — even when not throttled, per-call pricing turns the polling overhead into a meaningful bill line.
  3. Polling-interval freshness floor — pull-based monitoring cannot deliver sub-minute alerting if the scrape interval is 60s, regardless of how fast the target produces fresh values. The polling interval is structurally a freshness budget the monitor cannot beat.

Why push-based monitoring fixes the API-throttling axis

Push-based monitoring inverts the data flow: the source emits the metric (or a metric stream emits a snapshot of it), and the emission goes once to the collector. Architectural consequences:

  • No per-metric monitor-side API calls. The monitor doesn't fetch — it receives. There's no API quota in the monitor's scrape direction, because there's no scrape.
  • Sub-second freshness in principle. The lower bound on monitor-side latency is the source-side emission cadence, not a fixed scrape interval.
  • Producer-side back-pressure replaces monitor-side throttling. The push pipe (StatsD UDP, Metric Streams + Firehose, OTel exporter HTTP) becomes the constrained resource. Lost emissions show up as producer-side queue overflow rather than monitor-side scrape failures.
  • Service-discovery becomes optional. Sources self-identify on push; the monitor doesn't need a complete target list to collect from a new source.

What push monitoring gives up

  • Query-rate control. Pull lets the monitor decide when to ask. Push gives that control to the source — which can over- emit (high cost, high collector load) or under-emit (gaps).
  • Implicit liveness signal. A target that fails to respond to a scrape is implicitly dead from the monitor's perspective. In a push system, a silent source can't be distinguished from a healthy source with nothing to say without explicit heartbeats.
  • Centralised relabeling / filtering. Prometheus's pull-side relabel-configs apply transformations at scrape time. Push systems push that responsibility upstream to the source or downstream to the collector — typically requiring a richer collector tier (e.g. the OTel collector's processor stage).
  • Long-lived target tracking by absence. Pull monitors know when a target stops appearing; push monitors need explicit TTL or heartbeat semantics on metrics to detect silence.

Mixed shapes are common

In practice many production architectures do both: push for high-rate application metrics where freshness matters, pull for infrastructure metrics where service discovery is the harder problem. The 2026-05-13 source's customer architecture itself is a hybrid — push at the collector ingress (CloudWatch Metric Streams → Firehose → Lambda → NLB → OTel collector), but the collector then exports onward to potentially pull-shaped backends (e.g. Grafana Cloud).

Seen in

  • sources/2026-05-13-aws-streaming-cloudwatch-metrics-to-vpc-based-opentelemetry-collectors-using-lambdafirst canonical wiki home. The customer migrated off Prometheus + CloudWatch-exporter (pull) onto CloudWatch Metric Streams + Firehose + Lambda + OTel collector (push). The load-bearing rationale is the API-throttling failure mode: "resulted in higher API throttling. This caused metric loss and created gaps in observability data for business-critical systems." The push architecture's stated benefits — "reducing frequent polling and API calls, enabling near real-time data transmission" and "sub-minute latency for real-time alerting" — are the inverse of pull's named failures.
Last updated · 542 distilled / 1,571 read