Skip to content

CONCEPT Cited by 4 sources

Threat modeling

Definition

Threat modeling is the discipline, originating in security engineering, of enumerating threats against a system before deciding on countermeasures. A canonical threat model artifact contains:

  1. A summary of the system (or change) under review.
  2. A comprehensive list of threats — "all the nasty things that an adversary might try."
  3. A description of how the system is resilient to each threat (or an explicit acknowledgment that it isn't).

Writing down threats forces the author to think like an adversary and to enumerate before filtering — which catches more than designing countermeasures reactively.

Generalization: from security to durability

S3 adopts threat modeling as the structure of its durability reviews (see patterns/durability-review):

The process borrows an idea from security research: the threat model. The goal is to provide a summary of the change, a comprehensive list of threats, then describe how the change is resilient to those threats. In security, writing down a threat model encourages you to think like an adversary and imagine all the nasty things that they might try to do to your system. In a durability review, we encourage the same "what are all the things that might go wrong" thinking, and really encourage engineers to be creatively critical of their own code.

The same shape transfers: "adversary" becomes "failure mode," "attack" becomes "corruption / data-loss path." The structural benefits — comprehensive enumeration, explicit coupling between risks and countermeasures — remain.

Two specific properties the S3 team values

  1. It encourages authors and reviewers to really think critically about the risks we should be protecting against.
  2. It separates risk from countermeasures, and lets us have separate discussions about the two sides.

The second is the less-obvious payoff. In normal code review, "the risk" and "the mitigation" get argued in the same breath; you can win an argument about a specific fix while missing that the risk itself was mis-scoped. Threat modeling forces an explicit split.

Why the separation matters

Once risk and countermeasure are separate artifacts, the team can:

  • Prefer coarse-grained guardrails (simple mechanisms that kill whole classes of risks) over per-risk mitigations.
  • Notice when several risks would all be addressed by one structural change — and refactor rather than patch.
  • Hold the risk catalog constant across reviews, and spot when a new change should have triggered but didn't.

Warfield states the guardrail preference explicitly:

When we are identifying those protections, we really focus on identifying coarse-grained "guardrails". These are simple mechanisms that protect you from a large class of risks. Rather than nitpicking through each risk and identifying individual mitigations, we like simple and broad strategies that protect against a lot of stuff.

ShardStore's executable specification (see systems/shardstore) is the canonical example of such a guardrail: one mechanism that defeats many classes of disk-layer durability bugs.

Third generalization: to agentic-AI behavior envelopes

Byron Cook's 2026-02 interview (Source: sources/2026-02-17-allthingsdistributed-byron-cook-automated-reasoning-trust-ai) extends the shape one layer further — from security to durability to agentic-AI behavior correctness:

  • Adversary becomes out-of-envelope agent trajectory (anything the agent might do that the system should not permit).
  • Countermeasure becomes capability envelope + automated reasoning over composition (patterns/envelope-and-verify).
  • Coarse-grained guardrail becomes the envelope itself: one spec that kills entire classes of agent misbehavior, rather than per-action filters.

The structural benefits translate intact: comprehensive enumeration of "what could go wrong" before designing countermeasures; explicit separation of risk catalog from mitigation; preference for coarse-grained guardrails. AgentCore (see systems/bedrock-agentcore) is the runtime that enforces the envelope; concepts/automated-reasoning is what reasons about whether the envelope is tight enough.

This is the third consecutive domain generalization of the same discipline: security threat models → durability reviews (S3) → agent-behavior envelopes (Bedrock). Each time "adversary" gets reinterpreted; the rest of the methodology stays put.

Seen in

  • sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — original security framing and the durability-review generalization.
  • sources/2026-02-17-allthingsdistributed-byron-cook-automated-reasoning-trust-ai — further generalization to agentic-AI safety via patterns/envelope-and-verify and concepts/automated-reasoning over agent composition.
  • sources/2025-04-30-meta-building-private-processing-for-ai-tools-on-whatsapp — Meta uses threat modeling as the structural spine of the Private Processing announcement, not an appendix. Canonical three-part structure: assets (messages in-flight + in-draft, plus the secondary assets that support their confidentiality — the TCB of the CVM, underlying hardware, cryptographic keys in transit), threat actors (three named classes: malicious/compromised insiders with infra access; third-party or supply-chain vendors with component access; malicious end users targeting other users on the platform), threat scenarios with named TTPs — (a) external actors exploit the product attack surface or compromise services running in CVMs including AI-specific attacks like prompt injection; (b) internal or external attackers extract messages exposed through the CVM via observability side-channels; (c) insiders with physical or remote access tamper with the CVM at boot or runtime. New wiki axis: this is the first canonical wiki instance of threat modeling applied to a confidential-computing + private-AI-inference target. The discipline's structural benefits transfer intact from the S3 durability-review generalization — comprehensive enumeration before designing countermeasures; explicit separation of risk catalog from mitigation; preference for coarse-grained guardrails (Meta's "enforceable guarantees" requirement is the coarse guardrail that "modification attempts fail closed or become publicly discoverable" in one mechanism across many specific failure modes). AI-specific reinterpretation of the three elements: "adversary" now includes adversarial inputs to the model (prompt injection, jailbreaking) in addition to adversaries against the system; "attack surface" now includes the inference API; "mitigation" now includes hardened containerisation + input sanitisation + restricted entry points. Meta's stance is that threat modeling plus defence-in-depth plus verifiable transparency is the generalised answer, with each named control mapped to a named scenario — a strong illustration of why threat-modeling's "explicit separation of risk from countermeasure" property is load-bearing for reasoning about cross-layer composed defences.
  • sources/2026-06-12-dropbox-mcp-dash-design-to-code-securityFourth generalization: automated enforcement at code-review time. Dropbox quantifies the core failure mode of threat modeling: requirements documented but disconnected from implementation (only 12% of PRs link back; median 5-week delay between review and PR). Their system uses semantic retrieval (via Dash MCP) to automatically surface relevant threat models during code review, then an LLM reasons across both documents to identify gaps. This is the first wiki instance of automated, continuous verification that threat model requirements are reflected in code — closing the loop that all prior generalizations leave open (they produce the model but don't verify ongoing compliance). See concepts/design-to-code-traceability, patterns/automated-design-compliance-review.
Last updated · 542 distilled / 1,571 read