CONCEPT Cited by 1 source

Markdown sitemap¶

Definition¶

A markdown sitemap is a site's URL-index served at a well-known path (e.g. /blog/sitemap.md, /docs/sitemap.md) as a hierarchical markdown table of contents instead of a flat XML URL list. Each entry is a markdown link with the page's human-readable title; nested sections use indented markdown lists to preserve parent-child relationships.

The primitive is a close cousin of XML sitemap and llms.txt but sits in a distinct point of the agent-discovery design space.

Why it exists¶

Vercel's 2026-04-21 framing:

"XML sitemaps are flat lists of URLs with no titles, no hierarchy, and no indication of what each page is about. A markdown sitemap gives agents a structured table of contents with human-readable titles and parent-child relationships, so they can understand what content exists on your site and navigate to what they need."

For an agent:

XML sitemap — machine-parseable but semantically thin; you know URLs exist but not what they're about. Has to be combined with per-URL fetches to learn anything.
llms.txt — curated, human-authored, agent-optimized, but bounded to what the author chose to include. Not exhaustive.
Markdown sitemap — exhaustive like XML but titled and hierarchical like llms.txt; positioned between the two.

Canonical Vercel shapes¶

Vercel ships two variants in the 2026-04-21 post:

Flat, date-sorted — for blog posts where reverse- chronological ordering is the natural structure. Single-level markdown list:

# Blog sitemap

- [How we made global routing faster with Bloom filters](/blog/how-we-made-global-routing-faster-with-bloom-filters.md)
- [Inside Workflow DevKit: how framework integrations work](/blog/inside-workflow-devkit-how-framework-integrations-work.md)
- ...

Hierarchical, nested — for docs where parent-child grouping is load-bearing. Recursive renderer preserves nesting via two-space indent per level:

# Documentation sitemap

- [Getting Started](/docs/getting-started)
  - [Installation](/docs/getting-started/installation)
  - [First Deployment](/docs/getting-started/first-deployment)
- [Concepts](/docs/concepts)
  - [Projects](/docs/concepts/projects)
  - [Deployments](/docs/concepts/deployments)
...

The canonical recursive-renderer shape is five lines of TypeScript (see sources/2026-04-21-vercel-making-agent-friendly-pages-with-content-negotiation for full code). The pattern generalises to any static-site generator with a tree of pages: Docusaurus, VitePress, Next.js MDX, Astro, etc.

Why hierarchy matters for agents¶

An agent that fetches /docs/sitemap.md and sees

- [Databases](/docs/databases)
  - [Vercel Postgres](/docs/databases/vercel-postgres)
  - [Neon](/docs/databases/neon)
  - [Supabase](/docs/databases/supabase)

now knows that "Neon" is a child of "Databases" — useful context that flat XML sitemaps cannot carry. For queries like "what database options does this site document?", the agent can answer from the sitemap alone without fetching any individual page. The parent-child relationship is the semantic primitive that makes a sitemap more than a URL list.

Composition with concepts/markdown-content-negotiation ¶

Markdown sitemaps and markdown content negotiation are complementary, not competing:

Markdown sitemap answers "what URLs exist?"
Markdown content negotiation answers "get me this URL as markdown"

An agent that lands on the markdown sitemap reads the list, picks candidate URLs by title-match, and fetches each with Accept: text/markdown for efficient consumption. The two primitives together give the agent cheap discovery + cheap retrieval — the workflow that llms.txt + /index.md also supports on the Cloudflare side, via different primitives.

Differences from alternatives¶

Primitive	Format	Hierarchy	Titles	Exhaustive	Agent-parseable
XML sitemap	XML	No	No	Yes	Machine-only
`llms.txt`	markdown	Yes	Yes	Curated	LLM-friendly
Markdown sitemap	markdown	Yes	Yes	Yes	LLM-friendly
`robots.txt` Sitemap: directive	text	No	No	Points at XML	Machine-only

Markdown sitemaps + llms.txt are the two LLM-friendly primitives; the markdown sitemap trades llms.txt's curation for exhaustive coverage.

Failure modes and caveats¶

Context-window overflow on large sites. A site with 100,000 docs pages produces a sitemap too large for a single LLM read. Mitigation: split by section (per-product /docs/<product>/sitemap.md) — matches the pattern split llms.txt per subdirectory for llms.txt files.
Staleness without rebuild. Vercel's implementation uses export const dynamic = 'force-static'; — the sitemap is statically built, which means CMS content changes require a revalidate or rebuild to surface. For rapidly- changing content, switch to a dynamic route handler or short-TTL remote cache.
No lastmod metadata equivalent. XML sitemaps carry <lastmod> per URL; markdown sitemaps don't have a standard way to encode it. An agent that wants freshness metadata has to either fetch each page or fall back to the XML sitemap.
Path vs URL ambiguity in links. Markdown links in a sitemap may be relative (/blog/post.md) or absolute (https://vercel.com/blog/post.md). Agents typically handle both, but server-side middleware that rewrites paths should be consistent about which shape the sitemap uses.
No query-string / fragment semantics. A URL like /blog/post#section-2 is a page anchor; the sitemap typically only lists page-level URLs. Sub-page navigation requires the agent to fetch the page and parse it.

Discovery¶

How does an agent find the markdown sitemap? Three options:

Advertise in llms.txt — the llms.txt file can contain a link to /blog/sitemap.md.
link rel="alternate" in HTML <head> — the Vercel pattern (see patterns/link-rel-alternate-markdown-discovery).
Convention — agents probe /sitemap.md, /<section>/sitemap.md, or /docs/sitemap.md by default when looking for agent-friendly indexes.

No formal standard has landed as of 2026-04; discovery is by convention and by advertisement through llms.txt or link rel tags.

Seen in¶

sources/2026-04-21-vercel-making-agent-friendly-pages-with-content-negotiation — canonical wiki instance. Vercel's implementation for /blog/sitemap.md (flat) and /docs/sitemap.md (hierarchical, recursive). Both implementations shown as full Next.js route-handler snippets. Positioned as a companion primitive to markdown content negotiation.

concepts/sitemap — the XML-sitemap parent concept; markdown sitemap contrasts with it on hierarchy and titles.
concepts/markdown-content-negotiation — the complementary retrieval primitive.
concepts/llms-txt — the curated alternative; markdown sitemap is the exhaustive alternative.
concepts/agent-readiness-score — scores sitemap presence as an Agent Discovery dimension; as of 2026-04 scored XML-shape only, but the category extends naturally to markdown.
patterns/link-rel-alternate-markdown-discovery — the third layer of Vercel's discovery stack; can advertise a markdown sitemap or individual markdown URLs.
systems/nextjs — the framework in which Vercel's reference implementation is built; force-static route handlers generate the sitemap at build time.