Skip to content

PATTERN Cited by 2 sources

Observability as code

Treat every vendor-managed observability resource — dashboards, alert rules, SLOs, synthetic checks, recording rules, contact points — as files under version control, authored and edited locally, pushed back to the vendor by a CLI or as-code tool. The vendor UI stays for reading and for ad-hoc investigation, but the repository is the source of truth.

When to use this pattern

  • Observability resources have grown past "maybe 20 dashboards, maybe 40 alerts" and now need review discipline.
  • Multiple teams author observability resources; ambient UI-based authoring produces orphan / duplicate / drifted resources.
  • An org is migrating between observability stacks (vendor swap, OSS → managed, managed-vendor swap) and wants the resources to be portable.
  • Agents are expected to author or modify observability resources at scale — as-code resources sit in the same files agents already drive.

Required capabilities

  • Pull. The CLI can serialise the current vendor state into local files.
  • Edit. The files are in a format humans and agents can read / diff / merge.
  • Push. The CLI applies local files as the new authoritative state — creating, updating, and (where applicable) deleting vendor-side resources to match.
  • Idempotency. Pushing the same files twice produces the same result.
  • Name / ID stability. Applied resources keep the same identity across push cycles so dashboards, alert rules, and SLOs retain their URLs (for deep-linking, escalation, and history).

Mechanism

┌──────────────┐        pull         ┌──────────────┐
│  Grafana     │  ─────────────────> │  local files │
│  Cloud       │                     │  (YAML/JSON) │
│ (dashboards, │                     └──────┬───────┘
│  alerts,     │                            │
│  SLOs,       │                            │ edit
│  checks)     │                            │ (agent or
│              │                            │  human,
│              │  <─────────────────        │  version
└──────────────┘        push                │  controlled)
                                    ┌──────────────┐
                                    │ git / PR /   │
                                    │ review / CI  │
                                    └──────────────┘

The three nouns matter:

  1. Pull — the as-code lifecycle starts by materialising what's already in production. Avoids the rebuild-from-scratch tax.
  2. Edit — local files, in whatever editor or agent the author prefers; review / diff / merge via normal VCS.
  3. Push — applies to the vendor.

When to skip this pattern

  • Fewer than ~10 resources total, low change rate, single team — the pull-edit-push overhead exceeds the governance benefit.
  • The vendor doesn't expose idempotent apply APIs (common before CLI tooling matures) — the round-trip loses fidelity.
  • Realtime exploratory dashboarding — a half-built dashboard being mid-edit shouldn't require a PR.

Tradeoffs

  • Upfront investment in CLI tooling + pull-edit-push ergonomics.
  • File-format churn as vendors add new fields.
  • UI-affordances lag. Authors who live in the vendor UI learn new tooling.
  • Merge conflicts on shared dashboards become routine rather than UI-mediated.

Seen in

  • sources/2026-04-29-grafana-get-observability-in-the-terminal-for-you-and-your-agents-with-the-gcx-cli-tool — canonical Grafana-vendor framing. gcx ships the full pull-edit-push lifecycle across dashboards, alerts, SLOs, and synthetic checks. Post framing: "Pull dashboards, alerts, SLOs, and checks as files. Edit them locally with your agent. Push them back. Open a deep link into Grafana Cloud the moment a human needs to look." First vendor commitment to the whole lifecycle as a single shipping contract. Ties the pattern to the agent-ergonomic-CLI principles the CLI commits to (stable JSON/YAML, exit codes, machine-readable catalog).
  • sources/2026-03-17-airbnb-observability-ownership-migration — Airbnb built an in-house alert-authoring framework with autocomplete, backtesting, and diffing, implementing patterns/alerts-as-code (the narrower sibling pattern). Airbnb's decision to migrate to the as-code framework mid-vendor-migration empirically validated the "as-code centralises review and reduces per-team migration work" claim.
Last updated · 433 distilled / 1,256 read