PATTERN Cited by 2 sources
Multi-stage vulnerability discovery harness¶
Intent¶
Coordinate AI-driven vulnerability research at coverage by splitting it into a stage pipeline of independent agent populations, each with a tight scope and a specific output contract. The pipeline replaces "point a coding agent at the repo" — which fails by single-agent coverage failure — with many narrow agents in parallel, with adversarial validation, dedup, and reachability tracing as separate stages.
Canonical wiki instance¶
Cloudflare's vulnerability discovery harness, disclosed 2026-05-18 in Project Glasswing: what Mythos showed us. Used to scan "more than fifty" Cloudflare repositories spanning runtime, edge data path, protocol stack, control plane, and OSS dependencies, with ~50 hunters running concurrently.
The 8 stages¶
| # | Stage | Job | Output |
|---|---|---|---|
| 1 | Recon | Top-down repo read; subagents per subsystem; produce architecture document covering build commands, trust boundaries, entry points, attack surface; generate initial task queue. | Architecture doc + task queue |
| 2 | Hunt | Per-task: 1 attack class + scope hint. ~50 hunters concurrent; each fans out to a handful of exploration subagents; per-task scratch directory for compile-and-run PoC (patterns/proof-by-compile-and-run). | Findings with PoCs |
| 3 | Validate | Independent agent re-reads code, tries to disprove the finding. Different prompt, different model, no ability to emit new findings. | Confirmed / dropped flags |
| 4 | Gapfill | Hunters flag areas they touched but didn't cover thoroughly; those re-queued. | Augmented hunt queue |
| 5 | Dedupe | Findings sharing the same root cause collapse into a single record. | Unique findings |
| 6 | Trace | For each confirmed finding in a shared library, fan out one tracer agent per consumer repo; cross-repo symbol index; decide reachability. | Reachable / unreachable annotation |
| 7 | Feedback | Reachable traces become new hunt tasks in the consumer repo. | New tasks (loop-close) |
| 8 | Report | Reporting agent writes against schema; fixes its own validation errors; submits to ingest API. | Schema-valid records |
Cloudflare's per-stage rationale (verbatim)¶
| Stage | "Why it matters" |
|---|---|
| Recon | "Gives every downstream agent shared context. Cuts the wander problem." |
| Hunt | "This is where most of the work happens. Many narrow tasks in parallel, not one exhaustive agent." |
| Validate | "Catches a meaningful fraction of the noise the hunter wouldn't catch when reviewing its own work." |
| Gapfill | "Counteracts the model's tendency to drift toward attack classes it has already had success with." |
| Dedupe | "Variant analysis is a feature, not a way to inflate the queue with duplicates." |
| Trace | "Turns 'there is a flaw' into 'there is a reachable vulnerability.' This is the stage that matters most." |
| Feedback | "Closes the loop. The pipeline gets better as it runs." |
| Report | "Output is queryable data, not free-form prose." |
Architectural logic — why each stage exists¶
The pipeline embeds four design lessons into stage boundaries:
- Narrow scope is enforced by the Recon → Hunt boundary. Recon produces the scope; Hunt agents inherit it.
- Adversarial review is the Hunt → Validate boundary. The validator cannot emit; only the hunter can.
- Split bug and reachability is the Hunt → Trace boundary. Hunt asks "is this code buggy?"; Trace asks "can an attacker reach it?". They are different agents.
- Parallel narrow tasks is what makes the Hunt stage scale: ~50 hunters concurrent, not one exhaustive agent.
Why a pipeline shape, not a single mega-agent¶
Three properties the staging buys:
- Context-budget isolation per stage. Each stage runs in its own agent session(s), with a fresh context window. Findings persist between stages as data, not within one agent's context — sidestepping the single-agent coverage failure.
- Independence of validators. The validator stage's value comes from not sharing the hunter's history. A single mega-agent that does both produces a self-review, not adversarial review.
- Per-stage tool-surface narrowing. Hunt agents need compile-and-run tools (sandbox); Tracer agents need a cross-repo symbol index; Report agents need a schema validator and an ingest API. Per-stage tool surfaces improve agent accuracy and reduce blast radius if any one agent misbehaves.
Sibling pattern shape: AI Code Review¶
The same coordinator/sub-reviewer architecture shape appears in Cloudflare AI Code Review applied to MR-time code review. The 2026-05-18 vulnerability discovery harness is the second internal Cloudflare AI pipeline at coverage the wiki has visibility into; together they support the generalisation that the coordinator/sub-reviewer shape is Cloudflare's house style for AI orchestration in security-adjacent domains.
When to reach for this pattern¶
- AI-driven analysis at coverage on large surfaces (vulnerability research, codebase audit, security review, performance hotspot search, license / compliance audit).
- Output requires structured records consumed by downstream systems.
- False-positive control matters more than maximum recall.
- Reachability / consumer-context matters (analysis spans shared libraries with multiple consumers).
When not to reach for it¶
- Single-hypothesis tasks (one bug to fix, one feature to build, one specific question to answer). The harness's fixed cost (Recon + per-stage agents) overwhelms the benefit.
- Tasks that fit comfortably in a single-agent context window. The harness adds orchestration overhead.
- Tasks where reachability is irrelevant or where the reporting schema is an unnecessary constraint.
Open / not disclosed¶
- Per-stage cost share — Cloudflare doesn't break down what fraction of total tokens goes to each stage.
- How the inter-stage queue is implemented — message-bus, file-system, durable workflow, etc.
- Failure handling — what happens when a stage fails mid-run; is there per-task retry?
Seen in¶
- sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us — canonical wiki instance.
Related¶
- systems/cloudflare-vulnerability-discovery-harness — the canonical implementation.
- systems/mythos-preview — the engine inside the hunters.
- systems/cloudflare-ai-code-review — sibling multi-agent harness in a different domain.
- patterns/coordinator-sub-reviewer-orchestration — the generalised orchestration pattern.
- patterns/narrow-scoped-agent-task, patterns/parallel-narrow-agents-over-exhaustive, patterns/split-bug-and-reachability-questions, patterns/proof-by-compile-and-run, patterns/cross-repo-tracer-fan-out, patterns/gapfill-requeue-for-coverage, patterns/report-agent-self-validates-schema, patterns/adversarial-review-subagent — the per-stage patterns the harness composes.
- concepts/signal-to-noise-in-ai-vulnerability-triage — the failure mode the multi-stage shape exists to control.
- concepts/single-agent-coverage-failure-on-large-repos — the failure mode the multi-stage shape exists to bypass.