PATTERN Cited by 1 source
Proof by compile-and-run¶
Intent¶
Validate a model's claim about runtime behaviour — most notably the claim "this is an exploitable bug" — by giving the agent a per-task scratch environment in which it writes triggering code, compiles it, runs it, reads the failure, adjusts its hypothesis, and tries again until the claim is materialised as a concrete proof or definitively refuted.
Canonical articulation¶
Cloudflare on Mythos Preview's proof-generation capability (Source: sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us):
"Finding a bug and proving it's exploitable are two different things, and Mythos Preview can do both. It writes code that would trigger the suspected bug, compiles that code in a scratch environment, and runs it. If the program does what the model expected, that's the proof. If it doesn't, the model reads the failure, adjusts its hypothesis, and tries again. The loop matters as much as the bugs it finds, because a suspected flaw without a working proof is speculation, and Mythos Preview closes that gap on its own."
The loop¶
hypothesis (e.g. "UAF in fn X gives an arbitrary write")
│
▼
write triggering code in scratch dir
│
▼
compile in scratch environment
│
▼
run
│
▼
expected behaviour observed?
│ ┌─ yes → PoC artifact attached to finding
│ │
│ └─ no
│ │
▼ ▼
read failure (stderr, segfault address, exit code, output)
│
▼
adjust hypothesis
│
└──── (loop)
The exit conditions are "works" (proof) or some implicit attempt-budget exhaustion (Cloudflare doesn't disclose the loop budget).
Why the loop matters as much as the bugs¶
A finding without a proof is a hedge: it must still be verified by a human, who must reproduce the issue manually to decide whether to act. The loop shifts verification work from the human triage queue to the model's runtime, producing acted-on findings rather than "is this even real?" tickets. Cloudflare's verbatim operational impact: "a finding that arrives with a PoC is a finding you can act on."
The pattern is the AI-vuln-research instance of the agent self-correction loop — "if the program does what the model expected, that's the proof. If it doesn't, the model reads the failure, adjusts its hypothesis, and tries again." The explicit failure-reading step is what distinguishes this loop from blind re-trying.
Why per-task scratch isolation is load-bearing¶
A scratch environment shared across tasks would be an operational nightmare:
- Untrusted code execution. The agent runs code that, by construction, is trying to misbehave. A shared environment lets an exploit attempt corrupt state for other tasks.
- Concurrent runs. ~50 hunters running concurrently (per Cloudflare's harness) means concurrent compile/run pressure that has to be partitioned somehow.
- Reproducibility. Each finding's PoC must be reproducible standalone; a shared env couples tasks invisibly.
Cloudflare's solution is "a per-task scratch directory" inside the Hunt-stage tooling. This is the ephemeral-sandbox shape — created per-task, destroyed after.
What the loop produces beyond a yes/no¶
The compile-and-run output carries information beyond "is this a bug?":
- Exact reproduction recipe. The triggering source code is the proof.
- Behavioural signature. Segfault address, RIP value, side-channel observation — all of which inform severity.
- Constraints on exploit shape. "This bug needs N bytes of input", "this bug requires control over heap layout" — useful for the reachability question downstream.
- Negative results. When the loop refutes the hypothesis, the failure log (the run that didn't reproduce) is itself triage signal — separating "I tried and it didn't work" from "I just guessed".
Loop-failure modes¶
- Hypothesis-is-fundamentally-wrong. No amount of retrying produces the expected behaviour. A budget cap or give-up criterion is required; Cloudflare doesn't disclose theirs.
- Side-effect bleed. If the scratch isolation isn't airtight (e.g. shared kernel resources), one task's cleanup-failure leaks into the next task. Per-process or per-VM isolation may be needed at higher attempt counts.
- Compile-time blow-up. Some PoC attempts produce code that takes minutes to compile (template-heavy C++, recursive macros). Per-task time bounds matter.
- Resource exhaustion. A PoC that allocates aggressively in the scratch env can OOM the host; per-task resource caps mitigate.
Generalises beyond vulnerability research¶
The pattern is a specialisation of the broader "agent-with-execution-substrate" shape that appears elsewhere on the wiki:
- patterns/agent-spawn-parallel-exploration — Vercel's Turborepo agents have "benches to check your work" — performance hypothesis verified by running benchmarks. Same shape, different output.
- Code-fix verification — "did the patch fix the bug?" answered by re-running the failing test.
- Schema-validation loops — "does this output match the schema?" verified by running the validator (see patterns/report-agent-self-validates-schema).
The common structure: a model claim + a runnable verifier + a feedback loop that lets the model adjust is how AI-driven systems get correctness guarantees in the absence of model-internal reliability.
Cost / requirements¶
- Sandbox infrastructure — per-task scratch directories, process isolation, time/memory limits. Needs to handle ~50 concurrent tasks at Cloudflare scale.
- Compile toolchain available to the agent — with the programming languages the target codebase uses.
- Executable substrate — for system-level vulnerabilities, the sandbox needs to execute the same kind of binary the bug applies to (kernel exploits need kernel-running test envs, etc.).
- Time budget — compile-and-run loops can iterate for many seconds per attempt; the harness must accommodate per-task durations.
Seen in¶
- sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us — canonical wiki articulation; the second of two capability deltas Cloudflare credits Mythos Preview with.
Related¶
- concepts/proof-of-exploitability — the artifact produced.
- concepts/exploit-chain-construction — the upstream reasoning that each compile-and-run iteration validates.
- concepts/agent-self-correction-loop — the broader loop pattern.
- concepts/durable-vs-ephemeral-sandbox — the scratch-isolation property.
- patterns/multi-stage-vulnerability-discovery-harness — the harness that runs the loop at scale.
- patterns/narrow-scoped-agent-task — the per-task scope the loop runs inside.
- patterns/agent-spawn-parallel-exploration — sibling pattern in the performance domain.
- systems/mythos-preview — the capability provider.
- systems/cloudflare-vulnerability-discovery-harness — the canonical implementation.