CONCEPT Cited by 1 source

llms.txt¶

Definition¶

llms.txt is a plain-text file at the root of a website (e.g. https://example.com/llms.txt) that gives an LLM agent a structured reading list: what the site is, what's on it, and where the important content lives — a sitemap written for an LLM to read rather than a crawler to index.

Proposed September 2024 by Jeremy Howard; documented at llmstxt.org. Shape is a human-readable markdown skeleton:

# My Site

> A developer platform for building on the edge.

## Documentation
- [Getting Started](https://example.com/docs/start.md)
- [API Reference](https://example.com/docs/api.md)

## Changelog
- [Release Notes](https://example.com/changelog.md)

Each entry is a markdown link to the resource (or its markdown representation); category headings group related resources.

Why it exists¶

Agents have bounded context windows. Serving the full site in raw HTML is impossibly expensive in tokens; asking the agent to crawl the sitemap XML then render every HTML candidate wastes tokens at every step. llms.txt lets the site author pre- select what's worth reading and in what structure — a site- author-authored curation rather than an agent-side heuristic.

Distinction from `Accept: text/markdown` / markdown content negotiation¶

llms.txt = "what to read" (URL-list curation).
concepts/markdown-content-negotiation = "how to read URL N efficiently" (request markdown instead of HTML at the same URL).

They compose: a well-set-up site exposes llms.txt listing markdown URLs, and also lets any URL be fetched as markdown via content negotiation (with a /index.md fallback for agents that don't yet send Accept: text/markdown).

Failure mode: the giant-file problem¶

Large documentation sites naïvely generate one global llms.txt covering every page. Cloudflare's docs at ~5,000+ pages would produce a file that exceeds most agents' context windows, triggering grep-loop iteration on keyword searches — slower + lossier + more expensive than reading the whole file.

Cloudflare's canonical answer (2026-04-17): one llms.txt per top-level directory, with the root file pointing to each sub-file. Captured as patterns/split-llms-txt-per-subdirectory. Pair with removing directory-listing pages (no semantic content, only token cost) and rich per-entry titles + descriptions (the agent's steering wheel).

Adoption signal (2026-04)¶

Cloudflare's isitagentready scanner does not check llms.txt by default — it checks markdown content negotiation, which covers the same "agent can get this URL in markdown" guarantee more directly. Users opt in to the llms.txt check.

Seen in¶

sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — Cloudflare docs generate one llms.txt per top-level directory (canonical wiki instance); llms.txt + llms-full.txt are the Cloudflare docs' live instances. Measured benefit vs non-refined competitor docs: 31 % fewer tokens, 66 % faster to correct answer on a Kimi-k2.5/OpenCode benchmark.

concepts/markdown-content-negotiation — complementary primitive; llms.txt curates which URLs, markdown content negotiation gets those URLs in a dense format.
concepts/grep-loop — failure mode triggered by over-large llms.txt.
concepts/agent-readiness-score — where llms.txt is an opt-in Content-for-LLMs check.
concepts/machine-readable-documentation — parent concept; llms.txt is one specific file convention inside this broader posture.
patterns/split-llms-txt-per-subdirectory — the canonical structural answer for large-doc sites.
patterns/dynamic-index-md-fallback — the sibling pattern (how, not what) that llms.txt entries typically point at.
systems/cloudflare-developer-documentation — reference implementation.