Skip to content

CONCEPT Cited by 1 source

Proof of exploitability

Definition

A proof of exploitability is runnable code that demonstrates a suspected vulnerability is actually exploitable — closing the gap between "I found a flaw in the code" and "I have a working trigger that produces the expected behaviour." Without it, a finding is, in Cloudflare's verbatim formulation:

"a suspected flaw without a working proof is speculation, and Mythos Preview closes that gap on its own." (Source: sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us.)

The proof shape Cloudflare describes:

"It writes code that would trigger the suspected bug, compiles that code in a scratch environment, and runs it. If the program does what the model expected, that's the proof. If it doesn't, the model reads the failure, adjusts its hypothesis, and tries again. The loop matters as much as the bugs it finds."

This loop — write trigger → compile → run → read failure → adjust hypothesis → retry — is the proof-by-compile-and-run pattern.

Why proofs are load-bearing for AI vuln research

Without a proof, a model-generated finding is a hypothesis hedged with possibly/potentially/could in theory — exactly the model-bias- toward-finding-something failure mode that drives the concepts/signal-to-noise-in-ai-vulnerability-triage problem. The triage queue gets flooded with hedged findings that humans must dismiss one-by-one.

Cloudflare's verbatim datum on what proofs do to triage cost:

"A finding that arrives with a PoC is a finding you can act on, and it means far less time spent asking 'is this even real?'"

The proof is what decisively converts a finding from a hedge into an actionable item — both for prioritisation (severity is no longer speculative) and for remediation (the PoC reproduces the bug for the engineer fixing it).

Where the capability lives in the stack

Proof generation is one of two capabilities Cloudflare names as the difference between Mythos Preview and previous general-purpose frontier models — the other being exploit chain construction. Together they define the cyber frontier model class.

Other frontier models "would identify an interesting bug, write a thoughtful description of why it mattered, and then stop" — they could find but not prove. Proof generation closes the find-to-prove gap inside a single agent loop.

Why scratch-environment isolation matters

Proof generation requires a scratch environment per task:

  • The model runs untrusted-by-construction code (its own attempted exploit).
  • The triggering inputs may corrupt memory, segfault the process, or leave the system in a broken state.
  • Multiple proof attempts may run concurrently across hundreds of hunters; per-task isolation keeps them from interfering.

In Cloudflare's harness this is achieved via "a per-task scratch directory" in the Hunt stage; ~50 hunters run concurrently, each fanning out to "a handful" of exploration sub-agents, all isolated.

Distinguished from adjacent concepts

  • concepts/exploit-chain-construction — the reasoning about how primitives combine. Proof of exploitability is the demonstration that the chain works. A chain plan without a proof is unverified speculation.
  • Static analysis finding — a finding produced by reading the code; no runtime evidence. Proof of exploitability produces runtime evidence.
  • Fuzz-induced crash — a fuzzer can produce a crashing input without reasoning about why. Proof of exploitability carries an attached hypothesis the run validates or refutes.

Relationship to the hedge-finding tax

The wiki has an existing thread on false-positive management in detection systems (Figma's verbatim "manage false positives or they'll manage you"). Proof of exploitability is the AI-vuln-research analog: it doesn't prevent false positives upstream; it makes the cost of each finding's verification cheap enough to absorb, by moving verification work from human triage to model-driven runtime validation.

Open / not disclosed

  • Failure-rate of proof attempts"if it doesn't, the model reads the failure, adjusts its hypothesis, and tries again" — Cloudflare doesn't disclose how often the proof loop succeeds vs gives up.
  • Proof artifact format — what the proof looks like (a C program, a fuzzer harness, a Python driver, a binary) and whether it survives in the harness's reporting schema.

Seen in

Last updated · 542 distilled / 1,571 read