CONCEPT Cited by 1 source
Markdown sitemap¶
Definition¶
A markdown sitemap is a site's URL-index served at a
well-known path (e.g. /blog/sitemap.md, /docs/sitemap.md)
as a hierarchical markdown table of contents instead of a
flat XML URL list. Each entry is a markdown link with the
page's human-readable title; nested sections use indented
markdown lists to preserve parent-child relationships.
The primitive is a close cousin of XML
sitemap and llms.txt but sits in a
distinct point of the agent-discovery design space.
Why it exists¶
Vercel's 2026-04-21 framing:
"XML sitemaps are flat lists of URLs with no titles, no hierarchy, and no indication of what each page is about. A markdown sitemap gives agents a structured table of contents with human-readable titles and parent-child relationships, so they can understand what content exists on your site and navigate to what they need."
For an agent:
- XML sitemap — machine-parseable but semantically thin; you know URLs exist but not what they're about. Has to be combined with per-URL fetches to learn anything.
llms.txt— curated, human-authored, agent-optimized, but bounded to what the author chose to include. Not exhaustive.- Markdown sitemap — exhaustive like XML but titled and
hierarchical like
llms.txt; positioned between the two.
Canonical Vercel shapes¶
Vercel ships two variants in the 2026-04-21 post:
Flat, date-sorted — for blog posts where reverse- chronological ordering is the natural structure. Single-level markdown list:
# Blog sitemap
- [How we made global routing faster with Bloom filters](/blog/how-we-made-global-routing-faster-with-bloom-filters.md)
- [Inside Workflow DevKit: how framework integrations work](/blog/inside-workflow-devkit-how-framework-integrations-work.md)
- ...
Hierarchical, nested — for docs where parent-child grouping is load-bearing. Recursive renderer preserves nesting via two-space indent per level:
# Documentation sitemap
- [Getting Started](/docs/getting-started)
- [Installation](/docs/getting-started/installation)
- [First Deployment](/docs/getting-started/first-deployment)
- [Concepts](/docs/concepts)
- [Projects](/docs/concepts/projects)
- [Deployments](/docs/concepts/deployments)
...
The canonical recursive-renderer shape is five lines of TypeScript (see sources/2026-04-21-vercel-making-agent-friendly-pages-with-content-negotiation for full code). The pattern generalises to any static-site generator with a tree of pages: Docusaurus, VitePress, Next.js MDX, Astro, etc.
Why hierarchy matters for agents¶
An agent that fetches /docs/sitemap.md and sees
- [Databases](/docs/databases)
- [Vercel Postgres](/docs/databases/vercel-postgres)
- [Neon](/docs/databases/neon)
- [Supabase](/docs/databases/supabase)
now knows that "Neon" is a child of "Databases" — useful context that flat XML sitemaps cannot carry. For queries like "what database options does this site document?", the agent can answer from the sitemap alone without fetching any individual page. The parent-child relationship is the semantic primitive that makes a sitemap more than a URL list.
Composition with concepts/markdown-content-negotiation¶
Markdown sitemaps and markdown content negotiation are complementary, not competing:
- Markdown sitemap answers "what URLs exist?"
- Markdown content negotiation answers "get me this URL as markdown"
An agent that lands on the markdown sitemap reads the list,
picks candidate URLs by title-match, and fetches each with
Accept: text/markdown for efficient consumption. The two
primitives together give the agent cheap discovery + cheap
retrieval — the workflow that llms.txt + /index.md
also supports on the Cloudflare side, via different
primitives.
Differences from alternatives¶
| Primitive | Format | Hierarchy | Titles | Exhaustive | Agent-parseable |
|---|---|---|---|---|---|
| XML sitemap | XML | No | No | Yes | Machine-only |
llms.txt |
markdown | Yes | Yes | Curated | LLM-friendly |
| Markdown sitemap | markdown | Yes | Yes | Yes | LLM-friendly |
robots.txt Sitemap: directive |
text | No | No | Points at XML | Machine-only |
Markdown sitemaps + llms.txt are the two LLM-friendly
primitives; the markdown sitemap trades llms.txt's
curation for exhaustive coverage.
Failure modes and caveats¶
- Context-window overflow on large sites. A site with
100,000 docs pages produces a sitemap too large for a
single LLM read. Mitigation: split by section (per-product
/docs/<product>/sitemap.md) — matches the pattern split llms.txt per subdirectory forllms.txtfiles. - Staleness without rebuild. Vercel's implementation
uses
export const dynamic = 'force-static';— the sitemap is statically built, which means CMS content changes require a revalidate or rebuild to surface. For rapidly- changing content, switch to a dynamic route handler or short-TTL remote cache. - No
lastmodmetadata equivalent. XML sitemaps carry<lastmod>per URL; markdown sitemaps don't have a standard way to encode it. An agent that wants freshness metadata has to either fetch each page or fall back to the XML sitemap. - Path vs URL ambiguity in links. Markdown links in a
sitemap may be relative (
/blog/post.md) or absolute (https://vercel.com/blog/post.md). Agents typically handle both, but server-side middleware that rewrites paths should be consistent about which shape the sitemap uses. - No query-string / fragment semantics. A URL like
/blog/post#section-2is a page anchor; the sitemap typically only lists page-level URLs. Sub-page navigation requires the agent to fetch the page and parse it.
Discovery¶
How does an agent find the markdown sitemap? Three options:
- Advertise in
llms.txt— thellms.txtfile can contain a link to/blog/sitemap.md. link rel="alternate"in HTML<head>— the Vercel pattern (see patterns/link-rel-alternate-markdown-discovery).- Convention — agents probe
/sitemap.md,/<section>/sitemap.md, or/docs/sitemap.mdby default when looking for agent-friendly indexes.
No formal standard has landed as of 2026-04; discovery is by
convention and by advertisement through llms.txt or
link rel tags.
Seen in¶
- sources/2026-04-21-vercel-making-agent-friendly-pages-with-content-negotiation
— canonical wiki instance. Vercel's implementation
for
/blog/sitemap.md(flat) and/docs/sitemap.md(hierarchical, recursive). Both implementations shown as full Next.js route-handler snippets. Positioned as a companion primitive to markdown content negotiation.
Related¶
- concepts/sitemap — the XML-sitemap parent concept; markdown sitemap contrasts with it on hierarchy and titles.
- concepts/markdown-content-negotiation — the complementary retrieval primitive.
- concepts/llms-txt — the curated alternative; markdown sitemap is the exhaustive alternative.
- concepts/agent-readiness-score — scores sitemap presence as an Agent Discovery dimension; as of 2026-04 scored XML-shape only, but the category extends naturally to markdown.
- patterns/link-rel-alternate-markdown-discovery — the third layer of Vercel's discovery stack; can advertise a markdown sitemap or individual markdown URLs.
- systems/nextjs — the framework in which Vercel's
reference implementation is built;
force-staticroute handlers generate the sitemap at build time.