Skip to content

PATTERN Cited by 1 source

Gapfill requeue for coverage

Intent

Have agents self-report coverage gaps in their own work output, then automatically re-queue those gaps as new narrow tasks for the next pass. Counteracts the model's tendency to drift toward attack classes (or domains, or hypotheses) where it has already had success and away from those where it hasn't.

Canonical articulation

Cloudflare's Gapfill stage of the vulnerability discovery harness (Source: sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us):

"Hunters flag areas they touched but didn't cover thoroughly. Those areas get re-queued for another pass. Counteracts the model's tendency to drift toward attack classes it has already had success with."

The two-step shape

  1. Self-report. Hunters add a "covered shallowly: " annotation alongside their findings — explicitly declaring places they touched but didn't fully investigate.
  2. Re-queue. A Gapfill stage harvests those annotations and adds them as new tasks to the Hunt queue, possibly with adjusted scope hints (different attack class, deeper trust-boundary context, etc.) so the next agent doesn't repeat the shallow pass.

Why agent self-reporting works here

The hunter agent often knows it didn't cover an area thoroughly — "there's a use of this function I didn't investigate; I'm focused on command injection but a buffer overflow check would also apply". The information exists inside the agent's reasoning trace; the pattern is just asking the agent to surface it as a structured output.

This is a workable form of calibrated self-assessment. The wiki has a sibling concept, concepts/model-bias-toward-finding-something, that emphasises models don't reliably abstain; the Gapfill pattern works because it asks for a different signal — coverage gaps rather than findings — which the model can report without triggering the find-something bias.

What the pattern counteracts

Cloudflare's verbatim: "Counteracts the model's tendency to drift toward attack classes it has already had success with."

The drift is a sibling failure mode to agent hyperfixation — but operating across task batches rather than within a single agent session. When agents have recently found buffer overflows, subsequent agents (sharing the same model bias) are more likely to look for buffer overflows again. Without an explicit gapfill mechanism, the queue concentrates around "fashionable" attack classes and under-covers others.

Why a separate stage instead of fixing in-prompt

A "be exhaustive" instruction in the original prompt has known failure modes:

  • Model bias toward emission overrides the exhaustive-instruction. The model still drifts.
  • Long prompts hit context budget. Adding "and also consider buffer overflows, race conditions, integer overflows, command injection, …" expands the prompt for every task even when most are irrelevant.
  • No closure mechanism. A single-agent "be exhaustive" doesn't have a structural way to verify exhaustiveness.

A separate Gapfill stage decouples coverage from individual agent runs and makes coverage a pipeline property — the queue accumulates gap-tasks until the pipeline drains them.

Sibling patterns on the wiki

Generalises beyond vulnerability research

The pattern shape — agent self-reports incompleteness; incompleteness becomes a new task — applies anywhere exhaustive coverage is desired but not feasible per-session:

  • Code review: reviewer flags "I focused on the auth changes; the migration script also changed and I didn't fully review it" — flag becomes a new review task.
  • Document analysis: analyser flags "I extracted product-features but the document also has a pricing section I didn't extract".
  • Test coverage: agent flags "I added tests for the happy path; the error-handling branches are uncovered".

In all cases the gap is a structured output the model emits alongside its primary work, not a separate analysis pass.

Cost / requirements

  • Per-task structured-output schema that includes a gaps field — without it, the model has no place to emit the signal.
  • Prompt discipline — explicitly asking for gaps, framed in a way that the model treats as separate from emitting findings (so the find-something bias doesn't contaminate the gap report).
  • Queue dedup — when many hunters report the same gap, the gap should appear in the queue once, not N times. Reuses the Dedupe stage of the harness.
  • Termination criterion — the gap-loop has to halt; otherwise gaps spawn gaps spawn gaps. Cloudflare doesn't disclose theirs (per-area cap, per-batch budget, or diminishing-returns heuristic are all plausible).

Open / not disclosed

  • Gap-task prioritisation — when gaps re-queue, do they go to the front of the queue, the back, or distributed across batches?
  • Quantitative coverage uplift from Gapfill — Cloudflare doesn't measure how much additional coverage Gapfill adds vs running an extra round of fresh tasks.
  • Termination — the post doesn't disclose how the loop exits.

Seen in

Last updated · 542 distilled / 1,571 read