PATTERN Cited by 1 source

Split llms.txt per subdirectory¶

Pattern¶

For a large documentation site (thousands of pages), publish one llms.txt per top-level directory rather than one global file. The root llms.txt becomes a navigation index pointing to each sub-file; each sub-file fits within an agent's context window.

Companion practices:

Remove low-value directory-listing pages from each llms.txt — ~450 pages removed in Cloudflare's instance (e.g. workers/databases/ whose only content is a list of links to sub-pages already enumerated elsewhere in llms.txt).
Ensure rich titles + descriptions on every entry — the agent's "steering wheel" for deciding which doc to fetch. Drawn from existing page frontmatter; no extra work.

Motivation¶

At scale, the naïve "one global llms.txt for the whole site" approach breaks context windows. Cloudflare's developer docs are 5,000+ pages; a single llms.txt covering all of them exceeds every current agent's context window.

The failure mode that triggers is the grep loop: unable to read the whole file, the agent keyword-searches for candidate URLs, refining iteratively — slower, more tokens, narrower context, worse answers.

Splitting per subdirectory fixes this by aligning file size with context-window size. The agent reads the root llms.txt to pick a product directory, fetches the product's llms.txt (fits in one window), identifies the exact page, fetches it markdown-negotiation-style in one call. Single linear path, no grep loop.

Cloudflare's implementation¶

developers.cloudflare.com/llms.txt           # Root index — links to per-product files
developers.cloudflare.com/workers/llms.txt   # Workers docs
developers.cloudflare.com/r2/llms.txt        # R2 docs
...                                          # One per top-level product directory

Each sub-llms.txt entry follows the pattern:

- [Page Title](https://.../page/index.md) — concise description in 1-2 sentences

The /index.md URL is a dynamic fallback (implemented via two Cloudflare Transform Rules) that returns markdown regardless of whether the client sent Accept: text/markdown — so llms.txt-following agents always get markdown, not HTML.

Benchmark evidence¶

Cloudflare's Kimi-k2.5/OpenCode benchmark vs average non-refined technical documentation site:

31 % fewer tokens.
66 % faster to correct answer.

Both attributed to the combination of this pattern plus patterns/dynamic-index-md-fallback + concepts/markdown-content-negotiation + rich per-entry descriptions.

When to apply¶

Documentation corpus large enough that a single llms.txt exceeds typical-agent context windows (empirically: several thousand pages).
Natural top-level partitioning exists (per-product, per-API, per-service).
Directory index pages contribute no semantic content beyond the link list already in llms.txt.

When not to apply¶

Small sites (< a few hundred pages) — one llms.txt fits; the per-directory split is overhead.
No natural top-level hierarchy — arbitrary partitioning makes the root file hard to navigate.
Agent context windows are known to be large enough (e.g. Gemini 1M-token workloads might favor the single-file variant for its completeness).

Seen in¶

sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — canonical wiki instance. Cloudflare's developer-docs split at top-level product directories (~450 directory- listing pages removed, measured 31 % token / 66 % time improvement).

concepts/llms-txt — the file this pattern splits.
concepts/grep-loop — the failure mode the pattern prevents.
concepts/agent-context-window — the constraint shaping the split size.
concepts/machine-readable-documentation — parent posture.
patterns/dynamic-index-md-fallback — sibling pattern; each split-llms.txt entry typically points at the /index.md URL.
systems/cloudflare-developer-documentation — reference implementation.