PATTERN Cited by 1 source
Progressive fault injection¶
Pattern¶
Progressive fault injection applies canary-deployment principles to chaos experiments: inject faults into a tiny fraction of the system, validate safety, then expand scope incrementally. Each expansion step is gated by automated health checks — if thresholds breach, the experiment halts before customer impact.
Mechanism¶
- Start minimal — affect only 1% of target resources.
- Validate — monitor error rates, latency, and availability against stop-condition thresholds set well below SLA limits.
- Expand — if stable, increase to 5%, then 10%, then 25%.
- Halt on breach — any stop-condition alarm immediately terminates the experiment and rolls back injected faults.
Why it works¶
By limiting the initial blast radius to a statistically insignificant fraction of traffic, teams gain confidence that experiments reveal weaknesses without causing outages. The progressive nature also produces a gradient of failure tolerance data: if the system survives 5% but fails at 10%, that pinpoints exact capacity margins.
Seen in¶
- sources/2026-06-22-aws-architecting-ai-powered-resilience-framework-on-aws — AWS FIS experimentation layer with 1% → 5% → 10% → 25% progression