CONCEPT Cited by 1 source
llms.txt¶
Definition¶
llms.txt is a plain-text file at the root of a website (e.g.
https://example.com/llms.txt) that gives an LLM agent a
structured reading list: what the site is, what's on it, and
where the important content lives — a sitemap written for an LLM
to read rather than a crawler to index.
Proposed September 2024 by Jeremy Howard; documented at llmstxt.org. Shape is a human-readable markdown skeleton:
# My Site
> A developer platform for building on the edge.
## Documentation
- [Getting Started](https://example.com/docs/start.md)
- [API Reference](https://example.com/docs/api.md)
## Changelog
- [Release Notes](https://example.com/changelog.md)
Each entry is a markdown link to the resource (or its markdown representation); category headings group related resources.
Why it exists¶
Agents have bounded context windows. Serving the full site in
raw HTML is impossibly expensive in tokens; asking the agent to
crawl the sitemap XML then render every HTML candidate wastes
tokens at every step. llms.txt lets the site author pre-
select what's worth reading and in what structure — a site-
author-authored curation rather than an agent-side heuristic.
Distinction from Accept: text/markdown / markdown content negotiation¶
llms.txt= "what to read" (URL-list curation).- concepts/markdown-content-negotiation = "how to read URL N efficiently" (request markdown instead of HTML at the same URL).
They compose: a well-set-up site exposes llms.txt listing
markdown URLs, and also lets any URL be fetched as markdown via
content negotiation
(with a /index.md
fallback for agents that don't yet send Accept: text/markdown).
Failure mode: the giant-file problem¶
Large documentation sites naïvely generate one global llms.txt
covering every page. Cloudflare's docs at ~5,000+ pages would
produce a file that exceeds most agents' context windows,
triggering grep-loop iteration on keyword
searches — slower + lossier + more expensive than reading the
whole file.
Cloudflare's canonical answer (2026-04-17): one llms.txt per
top-level directory, with the root file pointing to each
sub-file. Captured as patterns/split-llms-txt-per-subdirectory.
Pair with removing directory-listing pages (no semantic
content, only token cost) and rich per-entry titles +
descriptions (the agent's steering wheel).
Adoption signal (2026-04)¶
Cloudflare's isitagentready scanner does not check llms.txt
by default — it checks markdown content negotiation, which
covers the same "agent can get this URL in markdown" guarantee
more directly. Users opt in to the llms.txt check.
Seen in¶
- sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready
— Cloudflare docs generate one
llms.txtper top-level directory (canonical wiki instance); llms.txt + llms-full.txt are the Cloudflare docs' live instances. Measured benefit vs non-refined competitor docs: 31 % fewer tokens, 66 % faster to correct answer on a Kimi-k2.5/OpenCode benchmark.
Related¶
- concepts/markdown-content-negotiation — complementary
primitive;
llms.txtcurates which URLs, markdown content negotiation gets those URLs in a dense format. - concepts/grep-loop — failure mode triggered by
over-large
llms.txt. - concepts/agent-readiness-score — where
llms.txtis an opt-in Content-for-LLMs check. - concepts/machine-readable-documentation — parent concept;
llms.txtis one specific file convention inside this broader posture. - patterns/split-llms-txt-per-subdirectory — the canonical structural answer for large-doc sites.
- patterns/dynamic-index-md-fallback — the sibling pattern
(how, not what) that
llms.txtentries typically point at. - systems/cloudflare-developer-documentation — reference implementation.