Skip to content

PATTERN Cited by 1 source

Two-tiered resilience gate

Pattern

Embed two levels of resilience validation into your CI/CD pipeline, each with different cost/coverage trade-offs:

  1. Lightweight gate (every commit) — policy-as-code checks (e.g., OPA validating IaC and Dockerfiles) that run in seconds. Catches basic configuration issues: missing health checks, single-AZ deployments, absent circuit breakers.

  2. Full gate (architectural changes only) — complete resilience assessments running 15–20 experiments over 15–45 minutes. Triggered when the deployment includes significant architectural changes; skipped for routine code changes that pass the lightweight gate.

Why two tiers

Running full chaos experiments on every commit is impractical (15–45 min delay per push). But skipping resilience checks entirely lets regressions slip to production. The two-tiered approach provides continuous regression coverage (tier 1) with comprehensive safety validation reserved for changes that actually alter the architecture (tier 2).

Pipeline flow

Commit → Build → Deploy to Test → Resilience Regression (3-5 scenarios, ~2-3 min)
  → Integration Tests → Deploy to Staging
  → [if architectural change] Full Assessment (15-20 experiments, 15-45 min)
  → Manual Approval → Deploy to Production → Continuous Monitoring

Seen in

Last updated · 547 distilled / 1,605 read