Skip to content

CONCEPT Cited by 1 source

llms.txt

Definition

llms.txt is a plain-text file at the root of a website (e.g. https://example.com/llms.txt) that gives an LLM agent a structured reading list: what the site is, what's on it, and where the important content lives — a sitemap written for an LLM to read rather than a crawler to index.

Proposed September 2024 by Jeremy Howard; documented at llmstxt.org. Shape is a human-readable markdown skeleton:

# My Site

> A developer platform for building on the edge.

## Documentation
- [Getting Started](https://example.com/docs/start.md)
- [API Reference](https://example.com/docs/api.md)

## Changelog
- [Release Notes](https://example.com/changelog.md)

Each entry is a markdown link to the resource (or its markdown representation); category headings group related resources.

Why it exists

Agents have bounded context windows. Serving the full site in raw HTML is impossibly expensive in tokens; asking the agent to crawl the sitemap XML then render every HTML candidate wastes tokens at every step. llms.txt lets the site author pre- select what's worth reading and in what structure — a site- author-authored curation rather than an agent-side heuristic.

Distinction from Accept: text/markdown / markdown content negotiation

They compose: a well-set-up site exposes llms.txt listing markdown URLs, and also lets any URL be fetched as markdown via content negotiation (with a /index.md fallback for agents that don't yet send Accept: text/markdown).

Failure mode: the giant-file problem

Large documentation sites naïvely generate one global llms.txt covering every page. Cloudflare's docs at ~5,000+ pages would produce a file that exceeds most agents' context windows, triggering grep-loop iteration on keyword searches — slower + lossier + more expensive than reading the whole file.

Cloudflare's canonical answer (2026-04-17): one llms.txt per top-level directory, with the root file pointing to each sub-file. Captured as patterns/split-llms-txt-per-subdirectory. Pair with removing directory-listing pages (no semantic content, only token cost) and rich per-entry titles + descriptions (the agent's steering wheel).

Adoption signal (2026-04)

Cloudflare's isitagentready scanner does not check llms.txt by default — it checks markdown content negotiation, which covers the same "agent can get this URL in markdown" guarantee more directly. Users opt in to the llms.txt check.

Seen in

Last updated · 200 distilled / 1,178 read