Skip to content

CONCEPT Cited by 2 sources

Grep loop

Definition

The grep loop is an agent failure mode where a documentation set that exceeds the agent's context window forces the agent into iterative grep-style keyword search across the corpus rather than reading the relevant section directly.

Named after Unix grep because the agent's behaviour looks identical to a human impatiently grep-ing through unfamiliar source: guess a keyword, see the hits, refine the keyword, try again, eventually find the answer — or give up and answer imprecisely.

Cloudflare's framing (2026-04-17)

Three failure-mode properties, each compounding:

  1. Cannot read the whole file. llms.txt too big; agent grep-searches for keywords.
  2. Narrowed context → lower accuracy. "When an agent relies on iterative searching rather than reading the full file, it loses the broader context of the documentation at hand. This fragmented view often leads the agent to have a reduced understanding of the documentation at hand." — a missed concept in an adjacent section never surfaces.
  3. Latency and token bloat. Each iteration burns "new thinking tokens" plus another search round-trip; total user-visible latency adds up; the total cost exceeds what a single doc-read would have cost.

Why it matters specifically for agents

For a human developer, grep-ing through a large codebase is fine — a human knows the surrounding context from prior experience. For an agent with a fresh context window and no out-of-session memory, grep-ing is a worst-case form of reading: only snippets that match the query enter context; the surrounding mental model doesn't.

Canonical structural fix

Split the corpus into context-window-sized chunks. Cloudflare's 2026-04-17 canonical answer: one llms.txt per top-level directory, root file points to each — captured as patterns/split-llms-txt-per-subdirectory. Each per-directory llms.txt fits in a single context window; the agent reads the index once, identifies the exact product doc it needs, and fetches it via markdown content negotiation in a single, linear path — no grep loop.

Complementary practices:

  • Remove directory-listing pages (token cost, no semantic content).
  • Ensure every index entry has rich titles + descriptions — the agent's steering wheel.

Benchmark evidence

Cloudflare's Kimi-k2.5/OpenCode benchmark against other large technical documentation sites' llms.txt:

  • 31 % fewer tokens.
  • 66 % faster to correct answer.

Both framed as the result of avoiding the grep loop.

  • Context-window exhaustion proper — when the total in-context content exceeds the window's capacity. The grep loop is the behaviour when exhaustion forces truncation/paging.
  • Context engineering — the discipline of choosing what enters the context window; the grep loop is a symptom of poor context engineering on the documentation-author side.

Seen in

  • sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — canonical wiki instance; Cloudflare's dogfood of developers.cloudflare.com explicitly designs to avoid the grep loop. Benchmark evidence above.
  • sources/2026-04-21-vercel-build-knowledge-agents-without-embeddingspaired inverse framing. Vercel's Knowledge Agent Template names agentic grep as the desired retrieval primitive, not a failure mode. The distinguishing axis: bounded corpus loaded into a sandbox vs unbounded web- doc corpus. When the agent has bash tools against a snapshot repo that fits the sandbox filesystem, agentic grep, find, cat becomes fast, deterministic, and traceable ("LLMs already understand filesystems ... you're not teaching the model a new skill; you're using the one it's best at"). Cloudflare's grep-loop critique still holds when the documentation set is too large to fit one context window; Vercel's filesystem pattern applies when the sandbox can load the entire corpus. Both framings coexist — they address retrieval at different corpus-boundary altitudes.
Last updated · 542 distilled / 1,571 read