Skip to content

PATTERN Cited by 1 source

Split llms.txt per subdirectory

Pattern

For a large documentation site (thousands of pages), publish one llms.txt per top-level directory rather than one global file. The root llms.txt becomes a navigation index pointing to each sub-file; each sub-file fits within an agent's context window.

Companion practices:

  • Remove low-value directory-listing pages from each llms.txt — ~450 pages removed in Cloudflare's instance (e.g. workers/databases/ whose only content is a list of links to sub-pages already enumerated elsewhere in llms.txt).
  • Ensure rich titles + descriptions on every entry — the agent's "steering wheel" for deciding which doc to fetch. Drawn from existing page frontmatter; no extra work.

Motivation

At scale, the naïve "one global llms.txt for the whole site" approach breaks context windows. Cloudflare's developer docs are 5,000+ pages; a single llms.txt covering all of them exceeds every current agent's context window.

The failure mode that triggers is the grep loop: unable to read the whole file, the agent keyword-searches for candidate URLs, refining iteratively — slower, more tokens, narrower context, worse answers.

Splitting per subdirectory fixes this by aligning file size with context-window size. The agent reads the root llms.txt to pick a product directory, fetches the product's llms.txt (fits in one window), identifies the exact page, fetches it markdown-negotiation-style in one call. Single linear path, no grep loop.

Cloudflare's implementation

developers.cloudflare.com/llms.txt           # Root index — links to per-product files
developers.cloudflare.com/workers/llms.txt   # Workers docs
developers.cloudflare.com/r2/llms.txt        # R2 docs
...                                          # One per top-level product directory

Each sub-llms.txt entry follows the pattern:

- [Page Title](https://.../page/index.md) — concise description in 1-2 sentences

The /index.md URL is a dynamic fallback (implemented via two Cloudflare Transform Rules) that returns markdown regardless of whether the client sent Accept: text/markdown — so llms.txt-following agents always get markdown, not HTML.

Benchmark evidence

Cloudflare's Kimi-k2.5/OpenCode benchmark vs average non-refined technical documentation site:

  • 31 % fewer tokens.
  • 66 % faster to correct answer.

Both attributed to the combination of this pattern plus patterns/dynamic-index-md-fallback + concepts/markdown-content-negotiation + rich per-entry descriptions.

When to apply

  • Documentation corpus large enough that a single llms.txt exceeds typical-agent context windows (empirically: several thousand pages).
  • Natural top-level partitioning exists (per-product, per-API, per-service).
  • Directory index pages contribute no semantic content beyond the link list already in llms.txt.

When not to apply

  • Small sites (< a few hundred pages) — one llms.txt fits; the per-directory split is overhead.
  • No natural top-level hierarchy — arbitrary partitioning makes the root file hard to navigate.
  • Agent context windows are known to be large enough (e.g. Gemini 1M-token workloads might favor the single-file variant for its completeness).

Seen in

Last updated · 200 distilled / 1,178 read