PATTERN Cited by 1 source
Split llms.txt per subdirectory¶
Pattern¶
For a large documentation site (thousands of pages), publish
one llms.txt per top-level directory
rather than one global file. The root llms.txt becomes a
navigation index pointing to each sub-file; each sub-file fits
within an agent's context window.
Companion practices:
- Remove low-value directory-listing pages from each
llms.txt— ~450 pages removed in Cloudflare's instance (e.g. workers/databases/ whose only content is a list of links to sub-pages already enumerated elsewhere inllms.txt). - Ensure rich titles + descriptions on every entry — the agent's "steering wheel" for deciding which doc to fetch. Drawn from existing page frontmatter; no extra work.
Motivation¶
At scale, the naïve "one global llms.txt for the whole site"
approach breaks context windows. Cloudflare's developer docs
are 5,000+ pages; a single llms.txt covering all of them
exceeds every current agent's context window.
The failure mode that triggers is the grep loop: unable to read the whole file, the agent keyword-searches for candidate URLs, refining iteratively — slower, more tokens, narrower context, worse answers.
Splitting per subdirectory fixes this by aligning file size
with context-window size. The agent reads the root llms.txt
to pick a product directory, fetches the product's llms.txt
(fits in one window), identifies the exact page, fetches it
markdown-negotiation-style in one call. Single linear path, no
grep loop.
Cloudflare's implementation¶
developers.cloudflare.com/llms.txt # Root index — links to per-product files
developers.cloudflare.com/workers/llms.txt # Workers docs
developers.cloudflare.com/r2/llms.txt # R2 docs
... # One per top-level product directory
Each sub-llms.txt entry follows the pattern:
The /index.md URL is a
dynamic fallback
(implemented via two Cloudflare Transform Rules) that returns
markdown regardless of whether the client sent
Accept: text/markdown — so llms.txt-following agents always
get markdown, not HTML.
Benchmark evidence¶
Cloudflare's Kimi-k2.5/OpenCode benchmark vs average non-refined technical documentation site:
- 31 % fewer tokens.
- 66 % faster to correct answer.
Both attributed to the combination of this pattern plus patterns/dynamic-index-md-fallback + concepts/markdown-content-negotiation + rich per-entry descriptions.
When to apply¶
- Documentation corpus large enough that a single
llms.txtexceeds typical-agent context windows (empirically: several thousand pages). - Natural top-level partitioning exists (per-product, per-API, per-service).
- Directory index pages contribute no semantic content beyond
the link list already in
llms.txt.
When not to apply¶
- Small sites (< a few hundred pages) — one
llms.txtfits; the per-directory split is overhead. - No natural top-level hierarchy — arbitrary partitioning makes the root file hard to navigate.
- Agent context windows are known to be large enough (e.g. Gemini 1M-token workloads might favor the single-file variant for its completeness).
Seen in¶
- sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — canonical wiki instance. Cloudflare's developer-docs split at top-level product directories (~450 directory- listing pages removed, measured 31 % token / 66 % time improvement).
Related¶
- concepts/llms-txt — the file this pattern splits.
- concepts/grep-loop — the failure mode the pattern prevents.
- concepts/agent-context-window — the constraint shaping the split size.
- concepts/machine-readable-documentation — parent posture.
- patterns/dynamic-index-md-fallback — sibling pattern;
each split-
llms.txtentry typically points at the/index.mdURL. - systems/cloudflare-developer-documentation — reference implementation.