Skip to content

PATTERN Cited by 1 source

Incident response calibrated to blast radius

Pattern

Scale incident-response intensity to the actual scope of harm, not to the conspicuousness of the incident. A loud, public, brand-visible incident with narrow blast radius can be handled with watchful waiting + a candid postmortem; a quiet, back-office incident with potential to damage customers requires full mobilisation even if nobody outside notices.

The mental model is:

                    Customer impact?
                     │         │
                     ▼         ▼
                    No        Yes
                     │         │
                public?  ──→  full incident
            ┌────────┴────────┐
            ▼                 ▼
       Yes: bounded       No: move on
       brand damage        (no response)
         watchful
         waiting +
         postmortem

Why it works

  • Incident-response attention is a finite resource. Treating every incident as maximum-intensity exhausts the on-call budget and crowds out the ones that actually matter.
  • The signal to escalate is impact, not visibility. A silent credential leak that puts customer data at risk is a bigger fire than a public Twitter ATO that posts a crypto scam — even though the Twitter event is more conspicuous.
  • Watchful waiting is a legitimate response mode when:
  • Customers are not under attack.
  • The compromised surface can't be used to escalate further (exfiltrate customer data, further phish users, move laterally).
  • The brand damage is bounded and recoverable via the platform's standard account-recovery path.

Fly.io's canonical application

Fly.io's 2025-10-08 postmortem is the wiki's canonical instance. The @flydotio Twitter account was taken over, the attacker revoked sessions and rotated 2FA, and recovery was gated on X.com support (~15 hours). Rather than paging everyone for a full incident, Fly.io:

  1. Verified the blast radius was bounded. "Our users weren't under attack, and the account wasn't being used to further intercept customer accounts."
  2. Checked the low-quality of the abuse. "The attack was pretty chill: a not-very-plausible crypto scam that presumably generated $0 for the attackers."
  3. Quantified the brand exposure. "15+ hours of brand damage, and extra security engineering cycles burnt on watchful waiting." — acceptable.
  4. Did the minimum containment: audited 1Password access logs, revoked adjacent access for recent pullers, filed the recovery ticket with X.
  5. Let it roll. "So we let it roll, until we got our account recovered the next morning."
  6. Post-incident: candid postmortem + move the account behind phishing-resistant auth.

"We're obviously making a lot of noise about this now, but we were pretty quiet during the incident itself (beyond just 'We know. We knew 45 seconds after it happened. We know exactly how it happened. It's just a Twitter thing.')" (Source: sources/2025-10-08-flyio-kurt-got-got)

When it fits

  • Attacker uses the account low-effort (crypto scams, spam) rather than targeted abuse (spear-phishing customers from the legitimate account).
  • The compromised account is brand-only, not a privileged-access surface. An ATO on a social account fits; an ATO on a company-admin SaaS does not.
  • The org has high trust capital with users and can afford the candour tax of a public postmortem.
  • Recovery path is known and tractable even if slow.

When it doesn't fit

  • Customers are under active attack. Full mobilisation, no debate.
  • The attacker controls a privileged surface that could enable lateral movement. Full mobilisation.
  • The blast radius is uncertain. Default to "treat as critical until scoped," not "watchful waiting until proved critical." Pairs with patterns/preemptive-low-sev-incident-for-potential-impact.
  • Legal / regulatory exposure. Some incidents require notification even if the scope is small.

Composition

  • Complements [[patterns/preemptive-low-sev-incident-for- potential-impact]]: open a low-sev for potential-but-unverified impact, then calibrate up or down based on actual scope. Fly.io's case shows the happy end: verified low scope, stayed low.
  • Composes with candid postmortem. A watchful-waiting response earns the right to a candid postmortem by being transparent about why the response was sized the way it was. The postmortem is the accountability lever.

Anti-patterns

  • "Public = critical" reflex. Treating social-media-visible incidents as top priority just because they're visible, regardless of customer impact. Wastes scarce incident response capacity.
  • "Private = ignorable" reflex. Treating quiet incidents as non-urgent because no one is watching. Customer-data compromise is often silent; privacy matters more than optics.
  • Over-escalation-as-CYA. Paging everyone every time so nobody can blame you for under-responding. Burns out the responders and devalues the paging channel.

Seen in

  • Fly.io Kurt Got Got (2025-10-08) — canonical wiki instance. Watchful-waiting response to the @flydotio Twitter ATO explicitly justified by the narrow, low-effort, brand- only blast radius. Recovery via X.com support path over ~15 hours, no paging of the broader team, candid postmortem afterward (sources/2025-10-08-flyio-kurt-got-got).
Last updated · 517 distilled / 1,221 read