PATTERN Cited by 1 source
Preemptive low-severity incident for potential impact¶
The pattern¶
Declare a low-severity incident (SEV4 / SEV5) before any customer impact is observed, on the basis of elevated risk from an external event — "in preparation for the worst." The declaration creates a shared coordination channel, documentation surface, and timeline in advance of potential customer harm, so if the harm materialises, incident response is already bootstrapped.
Canonical verbatim (Source: sources/2025-06-20-redpanda-behind-the-scenes-redpanda-clouds-response-to-the-gcp-outage):
"At this point, it was clear that multiple GCP services were experiencing a global outage, despite not having received support tickets from our customers or being paged by Redpanda Cloud alerts. So, in preparation for the worst, we preemptively created a low-severity incident to coordinate the response to multiple potential incidents."
The decision at 19:08 UTC — 27 minutes after being notified of the GCP outage by the GCP TAM, with no customer tickets and no internal alerts — is the load-bearing instance.
Why declare preemptively¶
- Coordination surface from t=0. The incident doc, Slack channel, and commander role are ready before the first real signal arrives — no scramble.
- Multi-customer / multi-signal coordination. Where a single potential incident would be handled case-by-case, an N-potential-incident scenario (one per customer or region) needs a shared context to avoid duplicate investigation.
- Observability preserved as timeline. Timeline reconstruction post-incident is easier if incident data (chat log, actions taken, decisions made) was captured in real-time rather than reconstructed.
- Psychological primer. On-call staff shift from routine-ops mode to incident-response mode earlier; faster responses if customer-impact does materialise.
- SEV4 is cheap. Low-severity incidents don't page executives or trigger external communication; the cost of opening one is near-zero.
When to declare preemptively¶
The trigger is elevated probability of customer impact, not confirmed impact. Examples:
- Cloud-provider global outage announcement. Your dependency tier is affected; downstream customer impact is plausible but not yet observed.
- Third-party vendor outage of a critical-path dependency (payment gateway, identity provider, DNS).
- Known-bad deployment in progress. A rollback is underway after smoke-test failure; wider impact is possible.
- Observable-anomaly without confirmed user harm. Error rates up on internal metrics but no customer tickets or alerts yet.
- Regional infrastructure event (power outage, natural disaster, network partition) that might degrade service.
Severity-level discipline¶
The pattern is coupled to a severity-ladder where:
- SEV4 / SEV5 = low-priority, no executive escalation, no customer comms, no paging beyond on-call engineer. Cheap to open, cheap to keep open, cheap to close.
- SEV3 = confirmed customer impact, investigation underway.
- SEV2 = confirmed widespread impact, customer comms started.
- SEV1 = full outage, all-hands response.
A preemptive SEV4 can escalate to SEV3 / SEV2 if the risk materialises. Conversely, it can close at SEV4 with no action needed if the risk dissipates — as in the 2025-06-12 Redpanda instance where the incident closed at SEV4 with no customer impact.
Composes with related patterns¶
- patterns/expiring-incident-mitigation — once in incident mode, other patterns (load shedding, failover) can be applied conditionally with auto-expiry on incident close.
- patterns/proactive-customer-outreach-on-elevated-error-rate — when to upgrade from preemptive-SEV to proactive-customer contact on observable signal.
- concepts/incident-mitigation-lifecycle — the preemptive SEV is the first stage of the mitigation lifecycle, before confirmation, triage, fix, and closure.
Variants¶
- Watch mode — declare preemptive SEV4 but take no action; just observe and coordinate if escalation needed.
- Prepositioned mitigation — declare preemptive SEV4 and pre-stage mitigations (e.g., warm up secondary regions, prepare DNS failover) to reduce latency if escalation comes.
- Customer-facing preemptive status — some orgs post status-page "Investigating" entries on public dashboards during preemptive SEV4, trading transparency for potential false-alarm cost.
Anti-patterns¶
- Declaring preemptive SEV3 or higher. Higher severities have escalation costs (pages, executive involvement) that are inappropriate for unconfirmed risk; false-alarm tolerance drops rapidly at SEV3+.
- Failing to close preemptive SEVs. If the risk dissipates and the SEV stays open, alert fatigue sets in and the pattern loses operational discipline.
- No incident command structure at SEV4. If SEV4 doesn't name a commander, the coordination value of the pattern is lost.
- No post-incident review for closed preemptive SEVs. Even if impact never materialised, the data from the near-miss (which was the risk, which controls worked, which didn't) is load-bearing for future calibration.
Redpanda timeline context¶
The 19:08 UTC preemptive SEV4 was the first element of a sequenced response:
| Time (UTC) | Event |
|---|---|
| 18:41 | GCP TAM notification |
| 18:42 | Impact assessment began |
| 18:43 | Observed degraded monitoring (third-party vendor partial outage) |
| 19:08 | Preemptive SEV4 declared |
| 19:23 | Cloud-marketplace vendor reported issues |
| 19:41 | Google identified root cause |
| 20:26 | Delayed alert notifications arrived |
| 20:56 | Proactive customer outreach began |
| 21:38 | Incident considered mitigated (severity unchanged at SEV4) |
The preemptive declaration bought 58 minutes of preparation time before the first observable impact (20:26 alerts). During that window the team was organised rather than scrambling.
Caveats¶
- Pattern requires a culture that doesn't penalise false alarms. If SEV4 closures without incident are seen as "crying wolf," teams will stop declaring preemptively and lose the value.
- Severity taxonomy must exist. SEV4 must be well-defined; some orgs conflate all SEVs into one escalation path.
- Not a substitute for good monitoring. Preemptive SEVs work best when paired with observability that will confirm real impact — otherwise the team is flying blind.
- Preemptive-SEV declaration can itself be a page. If the pattern costs engineer attention during declaration, overuse is wasteful.
- Customer communication policy must be explicit. Preemptive SEVs that leak to customer comms can create reputational risk for risk that never materialises.
Seen in¶
- sources/2025-06-20-redpanda-behind-the-scenes-redpanda-clouds-response-to-the-gcp-outage — canonical instance: Redpanda declared preemptive SEV4 at 19:08 UTC in response to GCP's global outage, with no customer tickets and no internal alerts yet. The incident closed at SEV4 at 21:38 UTC with no observed customer impact — a successful preemptive declaration.