CONCEPT Cited by 1 source
Break-glass escape hatch¶
Break-glass escape hatch is an explicit, telemetry-tracked mechanism for a human to override an automated gate when the automation is wrong, outaged, or not fast enough — without shutting the automation off.
The name comes from the break-glass-to-pull fire-alarm affordance: it exists, it is visible, and every use is recorded.
Why it matters for AI-in-the-critical-path systems¶
When an AI system sits between humans and shipping code (or deploying infrastructure, or approving spend), two failure modes are unavoidable:
- The AI is wrong — a false-positive critical finding blocks a hotfix that is actually correct.
- The AI is down — the upstream model provider is in outage, the orchestrator has crashed, a regression in the prompt has broken severity classification.
Without an override, both failure modes convert AI uptime into human-productivity loss. With an override that is tracked, the system continues to provide value in the common case and degrades gracefully in the edge case.
Cloudflare's AI Code Review instance¶
Cloudflare's framing:
"If a human reviewer comments
break glass, the system forces an approval regardless of what the AI found. Sometimes you just need to ship a hotfix, and the system detects this override before the review even starts, so we can track it in our telemetry and aren't caught out by any latent bugs or LLM provider outages."
Key design choices:
- Trigger is a specific string (
break glass) in a merge-request comment — discoverable, reproducible, greppable. - Detected before the review starts — the system doesn't waste tokens running an AI review it's about to override.
- Forces approval regardless of AI output — the override is unconditional; no "but only if the diff is small" caveats.
- Tracked in telemetry — every use is counted, dated, attributed to the commenter.
Telemetry as the payoff¶
The tracking is what turns the escape hatch from "a way around the system" into "a first-class signal about the system". In Cloudflare's first 30 days:
- 288 break-glass invocations across 48,095 MRs — 0.6%.
- Pattern-matching the 0.6% reveals real-world breakdowns:
- Clusters during known LLM provider outages (system was down when the user broke glass).
- Repeated false-positive patterns (same class of finding, same author, over time).
- High-frequency users (signals either override abuse or a chronically mis-prioritised review surface).
A break-glass rate that trends up over time is a signal the AI's judgment quality is degrading. A break-glass rate that stays flat-and-low is evidence the AI is usefully in the loop.
The alternative-design traps the hatch avoids¶
- No override at all — every AI bug becomes a developer-productivity incident.
- Unconditional bypass via a config flag — removes the telemetry signal entirely; teams silently work around the system.
- Break-glass requires a separate UI — friction kills adoption; developers disable the review instead of invoking the hatch.
- Break-glass triggers an alert — punishes use; converts a useful signal into an adversarial one.
Generalisation¶
Any AI-governed gate benefits from a telemetry-tracked escape hatch with:
- A deliberately mundane trigger (a comment, a flag, a keyword).
- Detection before the AI runs, not after.
- No conditions (if you broke glass, you broke glass).
- First-class telemetry that feeds back into quality measurement.
Applies to: AI code review, AI change-management gates, AI-driven deployment approval, AI-mediated customer-service routing, AI-driven spend-approval thresholds.
Seen in¶
- sources/2026-04-20-cloudflare-orchestrating-ai-code-review-at-scale — canonical implementation; 288 invocations (0.6%) in first 30 days.
Related¶
- systems/cloudflare-ai-code-review — the production consumer.
- patterns/rollout-escape-hatch — the sibling pattern applied to deployment rollouts.
- patterns/emergency-bypass — sibling pattern for emergency-access flows.