PATTERN Cited by 1 source
Expiring incident mitigation¶
Intent¶
Treat every protective control added during an incident as temporary by default — each ships with metadata (owner, incident reference, review date, removal hypothesis) and an explicit expiration. Keeping a mitigation permanent requires a conscious, documented decision after the incident, made deliberately rather than by inaction.
The goal is to break the "added, works initially, remains active without review, eventually blocks legitimate traffic" arc (see concepts/incident-mitigation-lifecycle) by giving every mitigation a retirement mechanism that fires if humans forget.
Shape¶
The install-time contract¶
Every incident-time control carries, as data co-located with the rule itself:
- Owner — individual or team responsible for the rule's lifecycle. Rotates with team membership, not rule identity.
- Incident reference — ticket ID or post-mortem URL that names why this rule exists.
- Expiration date — a default — e.g. "30 days from install", "end of next quarter". Can be extended, but never indefinitely by default.
- Removal hypothesis — a short statement of what would have to be true to retire the rule (e.g. "abuse traffic from fingerprint X remains below threshold for two weeks").
The review gate¶
On the expiration date (or earlier, if a precision drop triggers it):
- The rule's owner is paged / ticketed — not the on-call rotation, the actual owner.
- The review produces one of three outcomes: retire, extend with explicit justification, or convert to permanent (requires sign-off beyond the single owner).
- Extensions are bounded — an extension carries its own expiration date.
The audit dashboard¶
All active incident mitigations are queryable by:
- Age since install.
- Age since last review.
- Owner (and whether the owner is still at the company).
- Block rate over time.
- Precision over time (where measurable).
Stale rules — unreviewed past their expiration, owner departed, or block rate drifted — surface without requiring an ad-hoc audit.
Integration with the change-management surface¶
Install-time metadata is not a document in a wiki elsewhere; it lives in the rule itself. The config surface that serves the rule at runtime is the same surface that carries its lifecycle. Reviews happen in the same system where the rule is edited. Retirement is a regular change, not a special operation.
Why "just remember to review" doesn't work¶
- Volume. At scale, the number of active mitigations exceeds any single team's recall capacity.
- Rotation. Incident responders rotate; the person who installed a rule is rarely the person on-call two years later when it starts firing on legitimate users.
- Success amnesia. A mitigation that works produces no signal — there is nothing to remember about a rule doing its job silently.
- Documentation rot. Incident tickets and runbooks fragment across tools; the live rule and its originating context drift apart.
An expiration date is the only signal that fires whether or not anyone remembers the rule exists.
What breaks this pattern¶
- Extensions with no sunset. If "extend indefinitely" is a valid outcome, the discipline collapses back to the pre-pattern state. Extensions must carry their own expiration.
- Blanket owner migration. When a team reorganises, a mass-owner-transfer that doesn't force a review at transfer time leaves the rules unowned in practice.
- Silent tier-to-tier duplication. A rule added at the edge tier and mirrored at the service tier for belt-and- suspenders reasons — if only one side has expiration metadata, retiring the annotated one doesn't retire the duplicate.
- Review as rubber stamp. If the review UX makes "extend" the one-click default, the pattern gives cover without changing behaviour.
Contrast with related patterns¶
- patterns/emergency-bypass — the preventive sibling: make the emergency hatch official + audited at install time. Expiring-incident-mitigation is the hatch's retirement discipline. The two together cover the emergency-response lifecycle end-to-end.
- patterns/fast-rollback — the immediate sibling: a mitigation should be trivial to reverse the instant it stops working. Expiring-incident-mitigation forces an active reconsideration on a timer; fast-rollback keeps the reverse cost low enough to act on it.
- patterns/progressive-configuration-rollout — the deploy-time sibling: gates new mitigations through canaries so a bad rule fails small. Expiring-incident- mitigation gates old mitigations through a retirement review so a stale rule fails small.
Operational shape in practice¶
- Incident retrospective template asks explicitly: "Which emergency controls did we add? What is their expiration plan?" The question is mandatory, not optional. Absence of an answer fails the retrospective.
- Default expiration tracks incident severity: high- severity incidents may justify longer defaults (e.g. 90 days) to avoid churn during active threat campaigns; low- severity ones default shorter (e.g. 14 days).
- Rules surface with their telemetry: a review dashboard shows the rule's block count + precision time series alongside the expiration date, so reviewers see operating evidence, not just the rule's text.
- Cross-layer audits: at review time, check whether the rule has been duplicated at another tier; retire matching pairs together.
- Historical disclosure: retired rules remain queryable (disabled, not deleted) so the audit trail survives — in case the original threat pattern re-emerges.
Seen in¶
- sources/2026-01-15-github-when-protections-outlive-their-purpose — canonical wiki instance. GitHub's 2026-01-15 post-mortem explicitly commits to this pattern as part of its three- workstream remediation: "Treating incident mitigations as temporary by default. Making them permanent should require an intentional, documented decision. Post-incident practices that evaluate emergency controls and evolve them into sustainable, targeted solutions." The rollout status of the tooling is unspecified — the post states the commitment, not a shipped system. The failure mode that motivated it is the composite-fingerprint rule set whose stale members had been silently over-matching legitimate logged-out traffic.
Related¶
- concepts/incident-mitigation-lifecycle — the four-stage arc this pattern retires.
- concepts/defense-in-depth — the posture the pattern preserves in good working order.
- concepts/false-positive-management — the ongoing operational discipline at the detection-signal layer; expiring-incident-mitigation is the same discipline on the temporal axis.
- patterns/emergency-bypass — the preventive sibling at install time.
- patterns/fast-rollback — the reverse-cost sibling.
- patterns/cross-layer-block-tracing — the investigation discipline that makes stale-rule detection possible in layered-protection architectures.