Skip to content

CONCEPT Cited by 1 source

Destructive-automation blast radius

Definition

Destructive-automation blast radius is the scope of damage an automated destructive operation (delete, drop, terminate, detach, deregister) can reach before any isolation or review mechanism contains it. It is the blast radius concept specialised to the operation class most likely to cause an irreversible incident: deletion at fleet scale, typically driven by a daemon or supertool rather than a human.

Why destructive operations are a distinct class

Three properties separate them from other change classes:

  1. Irreversibility. Creation and mutation can usually be rolled back; deletion cannot. A mis-created resource can be deleted; a mis-deleted resource has to be rebuilt from backups or reconstructed from memory.
  2. Interpretation sensitivity. Destructive code paths often have a "delete everything matching X" shape. Any flaw in computing the match set (typo, empty set, wrong predicate) can scale the blast radius from "zero resources" to "the entire fleet" with no intermediate state. See the supertool collapse-to-all failure mode.
  3. Pacing as an amplifier. The faster the daemon runs, the more resources it can destroy before any human or error detects the issue. Zalando's metadpata postmortem names this explicitly: "As part of cost-saving measures, the pacing of executing deletion operations was sped up."

Named amplifiers

  • Cost-optimization on pacing — accelerating deletion to reduce idle-resource cost increases blast radius per-unit-time.
  • Unbounded operation scope — no per-run cap on the number of resources deleted.
  • Collapsed-predicate logic — code paths that interpret "no targets specified" as "all targets in scope".
  • Automation at review time — PR reviewers review the diff, not the set-computation outcome; the effective blast radius is invisible to review.

Containment patterns (Zalando stack)

The metadpata postmortem catalogues four complementary layers of containment:

  1. Pre-delete scream test. Reversible simulated-deletion state left for 1 week; real deletion only after scream window elapses. See concepts/scream-test-for-deletion and patterns/scream-test-before-destructive-delete.
  2. Cost-weighted deferral. Low-savings resources are deleted manually, not by the automation, with a 7-day cost-threshold gate. See concepts/cost-weighted-deletion-deferral.
  3. Change-preview in PR. The change that configures the supertool is previewed per-account in the PR via CloudFormation ChangeSets. See patterns/pr-preview-of-cloudformation-changeset.
  4. Phased rollout across release channels. Every change graduates through playground → test → infra → production account categories. See patterns/phased-rollout-across-release-channels.

Load-bearing insight

The defining feature of destructive-automation blast radius is that none of the containment layers remove the supertool's authority — they only slow, stage, or preview its use. Zalando does not argue for removing the cleanup daemon; cleanup is a real operational need and manual deletion of a fleet of thousands of accounts is infeasible. What changes is the control surface around the daemon: simulation before execution, preview before merge, gates before graduation.

Distinguishing from adjacent concepts

  • concepts/blast-radius (general) — the parent framing across all change classes (creation, mutation, deletion, permission change, config change).
  • concepts/supertool — the actor (application with fleet-wide authority). Destructive-automation blast radius is what a destructive supertool's authority translates into in the worst case.
  • Incident blast radius — the scope of a single incident after it fires. Destructive-automation blast radius is a pre-incident sizing: how bad can the worst case be?

Seen in

  • sources/2024-01-22-zalando-tale-of-metadpata-the-revenge-of-the-supertools — the canonical wiki instance. metadpata typo on one test account's config collapsed the account-lifecycle job's scope to "every account", causing fleet-wide Route 53 hosted-zone deletion. The five remediations catalogued in the post together define the containment stack for this concept class.
Last updated · 501 distilled / 1,218 read