CLOUDFLARE 2026-05-18 Tier 1

Cloudflare — Project Glasswing: what Mythos showed us¶

One-paragraph summary¶

Cloudflare engineering writeup (2026-05-18) of several months of running Mythos Preview — Anthropic's cyber frontier model preview, made available under Project Glasswing without the additional safeguards present in generally-available models like Opus 4.7 or GPT-5.5 — against more than fifty of Cloudflare's own repositories spanning "runtime, edge data path, protocol stack, control plane, and the open-source projects we depend on." Two capability deltas defined the jump from previous frontier models: (1) exploit chain construction — the model takes several small attack primitives (use-after-free → arbitrary read/write → control- flow hijack → ROP chain) and reasons about combining them into a working proof rather than reporting them in isolation; (2) proof generation — the model writes code that would trigger a suspected bug, compiles it in a scratch environment, runs it, reads the failure, adjusts its hypothesis, and tries again. Cloudflare also names a structural inconsistency: "the model organically pushes back on certain requests" via emergent guardrails — "the same task, framed differently or presented in a different context, could produce completely different outcomes" — making those organic refusals real but not consistent enough to serve as a complete safety boundary; this is precisely why future generally-available cyber frontier models "must include additional safeguards on top of this baseline behavior." The post then argues — with the longest section of the article behind it — that pointing a generic coding agent at an arbitrary repository does not work for vulnerability research because two model-shape mismatches dominate: (a) context — coding agents are tuned for one focused hypothesis at a time, but vulnerability research is "narrow and parallel by nature"; against a hundred-thousand-line repository a single agent session covers "maybe a tenth of a percent of the surface in a useful way before the model's context window fills up and compaction kicks in — potentially discarding earlier findings" (canonical AI-vuln-research instance of context rot degrading coverage); (b) throughput — a single-stream agent serialises work that real codebases need to fan out across. Four lessons followed: narrow scope produces better findings; adversarial review reduces noise (a second agent with a different prompt and a different model, with no ability to emit findings of its own, between the initial finding and the queue); splitting the chain across agents produces better reasoning ("is this code buggy?" and "can an attacker actually reach this bug from outside?" are two different questions, and the model is better at each separately); parallel narrow tasks beat one exhaustive agent. These four lessons compose into Cloudflare's vulnerability discovery harness, an 8-stage multi-agent pipeline ( Recon → Hunt → Validate → Gapfill → Dedupe → Trace → Feedback → Report) running ~50 hunters concurrently, each with exploration sub-agents and per-task scratch directories for compile-and-run proofs, adversarial validators with no finding-emission ability, and a cross-repo tracer that fans out one tracer instance per consumer repository to decide whether attacker-controlled input "actually reaches the bug from outside the system" — explicitly treating reachability as the stage that "matters most" (turns "there is a flaw" into "there is a reachable vulnerability"). The closing argument inverts the security- team default reaction to faster vulnerability discovery (which the post flags as "two-hour SLA from CVE release to patch in production" — "more than one team we have spoken with"): patching faster doesn't change the shape of the pipeline that produces the patch; "if regression testing takes a day, you cannot get to a two-hour SLA without skipping it, and the bugs you ship when you skip regression testing tend to be worse than the bugs you were trying to patch" (Cloudflare admits to having tried letting the model write its own patches and "watched a few go out that fixed the original bug while quietly breaking something else the code depended on"). The better lever is architectural defense: defenses that sit in front of the application and block the bug from being reached, application design that limits blast radius if a flaw exists, and the ability "to roll out a fix to every place the code is running at the same moment, rather than waiting on individual teams to deploy it" — which is the shape Cloudflare's edge-platform products already take.

Key takeaways¶

Mythos Preview is a "different kind of tool doing a different kind of work" relative to general-purpose frontier models. Verbatim: "the jump from what was possible with previous general-purpose frontier models to what Mythos Preview does today is not just a refinement of what came before." Two capabilities defined the delta: exploit chain construction (turning use-after-free → arbitrary R/W → control-flow hijack → ROP into a working proof) and proof generation (write triggering code, compile in scratch, run, read failure, adjust). Verbatim on the second: "a suspected flaw without a working proof is speculation, and Mythos Preview closes that gap on its own."
Other frontier models could find the same underlying bugs but stopped short of stitching primitives into a working chain. Verbatim: "A model would identify an interesting bug, write a thoughtful description of why it mattered, and then stop, leaving the actual chain unfinished and the question of exploitability open. What changed with Mythos Preview is that a model can now take those low-severity bugs (which would traditionally sit invisible in a backlog) and chain them into a single, more severe exploit." The architectural consequence — for the post and for the wiki — is that chained-primitive exploit construction reclassifies low-severity backlog bugs as actual security risk once a capable enough chainer exists.
Mythos Preview at Glasswing was provided without the additional safeguards present in GA models (Cloudflare names "Opus 4.7 or GPT-5.5" as the GA-tier comparison point). Despite that, the model exhibits emergent "organic" refusals — "much like the cyber capabilities that made it useful for vulnerability hunting, the model has its own emergent guardrails that sometimes cause it to push back on legitimate security research requests." Critical operational caveat: these refusals are real but inconsistent — "the same task, framed differently or presented in a different context, could produce completely different outcomes"; "the same request, framed differently, got a different answer, and even the same request can produce different outcomes across runs due to the probabilistic nature of the model. Semantically equivalent tasks can produce opposite outcomes depending on how and when they're presented to the model." Position stated: "any capable cyber frontier model made generally available in the future must include additional safeguards on top of this baseline behavior — making it appropriate for broader use outside of a controlled research context like Project Glasswing." Canonical wiki instance of model organic refusal inconsistency.
The signal-to-noise problem in AI-driven vulnerability triage has two dominant axes: programming language and model bias. On language: "C and C++ give you direct memory control and, with it, bug classes — buffer overflows, out-of-bounds reads and writes — that memory-safe languages like Rust eliminate at compile time. We saw consistently more false positives from projects written in memory-unsafe languages." On model bias — verbatim, the canonical wiki articulation of concepts/model-bias-toward-finding-something: "A good human researcher tells you what they found and how confident they are. Models don't. Ask a model to find bugs, and it will find them, whether the code has any or not. Findings come back hedged with 'possibly,' 'potentially,' 'could in theory,' and the hedged findings vastly outnumber the solid ones. That's a reasonable bias for an exploratory tool. It's a ruinous one for a triage queue, where every speculative finding spends human attention and tokens to dismiss, and that cost compounds across thousands of findings." Mythos Preview's PoC-attached findings move the needle in the direction of "a finding that arrives with a PoC is a finding you can act on". See concepts/signal-to-noise-in-ai-vulnerability-triage.
A generic coding agent pointed at a repo cannot do vulnerability research at coverage. Two structural reasons:
Context shape mismatch. "Coding agents are tuned for one focused stream of work: building a feature, fixing a bug, writing a refactor. They ingest a lot of source code, hold a single hypothesis at a time, and iterate against it. That's exactly the wrong shape for vulnerability research, which is narrow and parallel by nature." A human researcher does "one specific thing" — a single complex feature, transitions across security boundaries, a specific vulnerability class like command injection — and "then they do it again, for a different feature, security boundary, or vulnerability class, several thousand times across the codebase."
Coverage failure from context-window dynamics. Verbatim: "A single agent session (even with subagents) against a hundred-thousand-line repository can cover maybe a tenth of a percent of the surface in a useful way before the model's context window fills up and compaction kicks in — potentially discarding earlier findings that would have mattered." Canonical instance of context rot applied to AI vulnerability research; sibling failure mode to agent hyperfixation surfaced in Vercel's Turborepo agent experiment.
Throughput. "A single-stream agent does one thing at a time, but real codebases need many hypotheses against many components at once." The single-agent shape stops scaling not on the model but on "the shape of the interaction itself."
Four lessons crystallised from running the work at scale, each one motivating a harness component:
Narrow scope produces better findings. "Telling the model 'Find vulnerabilities in this repository' makes it wander. Telling it 'Look for command injection in this specific function, with this trust boundary above it, here's the architecture document and here's prior coverage of this area' makes it do something much closer to what a researcher would actually do." The scoping shape: one attack class + scope hint + architecture document + prior coverage of this area.
Adversarial review reduces noise. Cloudflare's verbatim formulation: "Adding a second agent between the initial finding and the queue — one with a different prompt, a different model, and no ability to generate its own findings — catches a lot of the noise that the first agent would miss if it just checked its own work. It turns out that putting two agents in deliberate disagreement is way more effective than just telling one agent to be careful." Three independence axes named: different prompt, different model, and the load- bearing no ability to generate its own findings — the validator can only refute, not contribute, which prevents the validator from inflating the queue.
Splitting the chain across agents produces better reasoning. "Asking 'Is this code buggy?' and 'Can an attacker actually reach this bug from outside the system?' are two different questions, and the model is better at each one when you ask them separately, because each question is narrower than the combined version."*
Parallel narrow tasks beat one exhaustive agent. "Coverage improves when many agents work on tightly scoped questions and we deduplicate the results afterward, rather than asking one agent to be exhaustive."
The harness is an 8-stage pipeline, named explicitly stage-by-stage (canonical wiki instance of patterns/multi-stage-vulnerability-discovery-harness):

Stage	What it does	Why it matters
Recon	Reads the repo top-down, fans out to subagents per subsystem, produces an architecture document covering build commands, trust boundaries, entry points, and likely attack surface; generates the initial task queue.	"Gives every downstream agent shared context. Cuts the wander problem."
Hunt	Each task is one attack class + scope hint. ~50 hunters run concurrently, each fanning out to a handful of exploration sub-agents. Each hunter has tools that compile and run PoC code in a per-task scratch directory.	"This is where most of the work happens. Many narrow tasks in parallel, not one exhaustive agent."
Validate	An independent agent re-reads the code and tries to disprove the original finding. Different prompt, no ability to emit new findings.	"Catches a meaningful fraction of the noise the hunter wouldn't catch when reviewing its own work."
Gapfill	Hunters flag areas they touched but didn't cover thoroughly; those areas get re-queued.	"Counteracts the model's tendency to drift toward attack classes it has already had success with."
Dedupe	Findings sharing the same root cause collapse into a single record.	"Variant analysis is a feature, not a way to inflate the queue with duplicates."
Trace	For each confirmed finding in a shared library, a tracer agent fans out (one instance per consumer repo), uses a cross-repo symbol index, and decides whether attacker-controlled input actually reaches the bug from outside.	"Turns 'there is a flaw' into 'there is a reachable vulnerability.' This is the stage that matters most."
Feedback	Reachable traces become new hunt tasks in the consumer repos where the bug is exposed.	"Closes the loop. The pipeline gets better as it runs."
Report	A reporting agent writes against a predefined schema, fixes its own validation errors, and submits to an ingest API.	"Output is queryable data, not free-form prose."

Mythos Preview was used to build, tailor, and improve the harness it would then run inside — "We used Mythos Preview to build on, tailor, and improve our original harnesses to suit its strengths."

Faster patching is not the right primary response to AI-accelerated vulnerability discovery. Verbatim on the default reaction: "More than one team we have spoken with is now operating under a two-hour SLA from CVE release to patch in production." Cloudflare's structural counter: "Patching faster does not change the shape of the pipeline that produces the patch. If regression testing takes a day, you cannot get to a two-hour SLA without skipping it, and the bugs you ship when you skip regression testing tend to be worse than the bugs you were trying to patch." Cloudflare attempted model-authored patches and "watched a few go out that fixed the original bug while quietly breaking something else the code depended on" — a first-person datum on the regression cost. The proposed alternative is architectural defense (concepts/architectural-defense-vs-faster-patching): "defenses that sit in front of the application and block the bug from being reached", "designing the application so that a flaw in one part of the code cannot give an attacker access to other parts", and "being able to roll out a fix to every place the code is running at the same moment, rather than waiting on individual teams to deploy it." The dual-edged framing closes the post: "the same capabilities that helped us find bugs in our own code will, in the wrong hands, accelerate the attack side against every application on the Internet" — a Cloudflare-product pitch by way of stating that the architectural levers being advocated are precisely what its edge platform applies on behalf of customers.
C/C++ vs Rust as a confounder of AI-vuln signal-to-noise. "We saw consistently more false positives from projects written in memory-unsafe languages." Reinforces the wiki's standing memory-safety thread (Aurora DSQL, Meta wamedia, Dropbox Nucleus, Datadog Go) with a new axis: AI-vuln triage cost is itself a hidden tax of memory-unsafe substrates beyond the well-known exploit- surface tax — the model's exploratory bias produces more false-positive findings on C/C++ codebases that engineers then spend tokens and human attention dismissing.
The prompt for adversarial validators must explicitly forbid finding emission, not merely instruct adversarial review. The verbatim Cloudflare clause: "a second agent between the initial finding and the queue — one with a different prompt, a different model, and no ability to generate its own findings — catches a lot of the noise." Without the no-emission constraint the validator amplifies queue size instead of refining it. Strengthens the wiki's existing adversarial review persona thread (Atlassian Rovo Dev) with a vulnerability-research instance and a sharper constraint.

Architectural numbers¶

Datum	Value	Where
Repos scanned	"more than fifty"	First section
Surfaces scanned	runtime, edge data path, protocol stack, control plane, OSS deps	Harness section
Concurrent hunters	"typically around fifty at once"	Hunt stage
Per-hunter sub-agents	"a handful" of exploration sub-agents	Hunt stage
Per-task isolation	per-task scratch directory for PoC compile/run	Hunt stage
Single-agent useful coverage on 100k-LoC repo	"maybe a tenth of a percent" before context-window compaction	Generic-coding-agent argument
External SLA reference point	"two-hour SLA from CVE release to patch in production"	What this means for security teams
Model class disclosed for refusal-comparison	Mythos Preview without GA safeguards; "Opus 4.7 or GPT-5.5" as GA reference points	Refusal section
Stages in the harness	8 (Recon, Hunt, Validate, Gapfill, Dedupe, Trace, Feedback, Report)	Harness table

Caveats¶

No quantitative bug counts. The post is an architectural retrospective, not a results paper; specific vulnerability counts, severity distribution, or per-stage noise-reduction percentages are not disclosed.
No per-component cost numbers. Number of hunters and sub-agents are sized as "around fifty" and "a handful"; per-MR-equivalent token spend, per-stage failure rates, and human-triage overhead are unstated.
Mythos Preview is preview-only. Anthropic's GA safeguards (in Opus 4.7 / GPT-5.5) are stated to be absent in the Glasswing-provided model — the architecture lessons port forward to GA models, but the "organic refusal inconsistency" observation is specific to a model without hardened safety boundaries.
The first-person "we tried letting the model write its own patches" anecdote is unquantified. "We watched a few go out that fixed the original bug while quietly breaking something else" — "a few" is the only quantification given.
Project Glasswing organisational details are out-of-band. The post links Anthropic's Glasswing landing page but does not detail the cohort size, partner list, or duration.
Dual-use disclosure framing. The closing section acknowledges the same capabilities accelerate attackers; the post does not detail what (if anything) Cloudflare or Anthropic disclose to public CVE channels from Glasswing-found bugs beyond "every vulnerability surfaced through this work was triaged, validated, and remediated where action was needed under Cloudflare's formal vulnerability management process."

Source¶

Original: https://blog.cloudflare.com/cyber-frontier-models/
Raw markdown: raw/cloudflare/2026-05-18-project-glasswing-what-mythos-showed-us-f1ec214c.md