Skip to content

SYSTEM Cited by 1 source

rds-health

What it is

rds-health is an open-source Go CLI released by Zalando in February 2024 that automates anomaly and performance-issue detection across an AWS RDS fleet by evaluating Zalando's concepts/golden-signals-rds|12 golden signals against AWS Performance Insights time-series metrics. It is architecturally a frontend for AWS APIs — it does not collect or store metrics itself; it queries the Performance Insights API for each RDS instance/cluster in the configured account(s) and region(s), applies the golden-signal thresholds, and emits a consolidated health report. Source: github.com/zalando/rds-health.

Why it exists

Zalando's database- per-service microservices posture produced a fleet of RDS Postgres instances large enough that manual anomaly detection via ad-hoc scripts and per-instance CloudWatch dashboards became untenable: "A combination of manual processes and ad-hoc scripts to manage the health conditions of database instances are not an option at the scale. [...] some teams are required to allocate engineers for sprint or even months for such activities." AWS already ships the raw observability surface (CloudWatch, Performance Insights) — the missing piece was a utility that evaluates the methodology holistically across the fleet with a single invocation. rds-health closes that gap. (Source: sources/2024-02-19-zalando-twelve-golden-signals).

Key features

Per the article, rds-health exposes three primary operations:

  • Configuration inventory — show configuration of all RDS instances and clusters in an account/region.
  • Fleet health check — evaluate the 12 golden signals against Zalando's empirical thresholds on all RDS deployments and return per-instance health status.
  • Capacity planning — project forward from current metrics to estimate capacity headroom.

Customisation is explicit: "The utility can be easily customised to meet specific use cases, allowing users to tailor their workflows to their unique needs" — the threshold tables and signal definitions are knobs, not hardcoded policy.

Place on the wiki

rds-health is the canonical example of the patterns/fleet-wide-methodology-via-cli pattern: when the bottleneck to scaling ops is fragmentation of methodology across teams, you ship the methodology as an executable artefact rather than as a wiki page. The wiki separates the methodology (concepts/golden-signals-rds) from the utility that automates it (this page); either one is reusable without the other.

Seen in

Last updated · 501 distilled / 1,218 read