CONCEPT Cited by 1 source
Sitemap¶
Definition¶
A sitemap is an XML file listing every URL on a site, plus
per-URL metadata (last-modified, change frequency, priority). The
standard is sitemaps.org (2005). A
site typically publishes the sitemap URL in
robots.txt via a Sitemap: directive.
Why it exists¶
Before sitemaps, crawlers had to traverse the link graph —
follow every <a href> from the home page, discover pages, then
re-traverse on each crawl cycle. Sitemaps let the site hand the
crawler a URL list directly, with freshness metadata that short-
circuits re-crawls of unchanged pages.
Role for AI agents¶
Sitemap is the agent's map of everything the site has. Unlike
llms.txt (which is curated and compact),
sitemaps are exhaustive and verbose. An agent typically
consults the sitemap via the pointer in robots.txt, picks
candidate URLs by path/title, and fetches those with
markdown content
negotiation where supported.
Limitations for agents¶
- No semantic content — just URLs + metadata; the agent still has to fetch each page to understand what's on it.
- Directory-listing pages pollute sitemaps — they're
structurally valid URLs with zero semantic value. Cloudflare's
2026-04-17 dogfood explicitly removes ~450 directory-listing
pages from its
llms.txtfor exactly this reason, while leaving them in the sitemap. - No size cap enforced — very large sites produce sitemaps too big to fit in an agent's context window; they're designed for machine-scale parsing, not for a single LLM read.
Canonical agent-era role¶
The Agent Readiness Score
(concepts/agent-readiness-score) grades sitemap presence under
the Agent Discovery dimension — alongside robots.txt and the
Link: response header — as
one of three discovery primitives an agent can use to enumerate a
site's content.
Seen in¶
- sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — canonical Agent-Discovery-dimension instance in the Agent Readiness Score.
Related¶
- concepts/robots-txt — where sitemaps are typically advertised.
- concepts/llms-txt — curated, agent-centric alternative to the exhaustive sitemap.
- concepts/agent-readiness-score — where sitemap presence is graded.