Skip to content

CONCEPT Cited by 1 source

Markdown sitemap

Definition

A markdown sitemap is a site's URL-index served at a well-known path (e.g. /blog/sitemap.md, /docs/sitemap.md) as a hierarchical markdown table of contents instead of a flat XML URL list. Each entry is a markdown link with the page's human-readable title; nested sections use indented markdown lists to preserve parent-child relationships.

The primitive is a close cousin of XML sitemap and llms.txt but sits in a distinct point of the agent-discovery design space.

Why it exists

Vercel's 2026-04-21 framing:

"XML sitemaps are flat lists of URLs with no titles, no hierarchy, and no indication of what each page is about. A markdown sitemap gives agents a structured table of contents with human-readable titles and parent-child relationships, so they can understand what content exists on your site and navigate to what they need."

For an agent:

  • XML sitemap — machine-parseable but semantically thin; you know URLs exist but not what they're about. Has to be combined with per-URL fetches to learn anything.
  • llms.txt — curated, human-authored, agent-optimized, but bounded to what the author chose to include. Not exhaustive.
  • Markdown sitemap — exhaustive like XML but titled and hierarchical like llms.txt; positioned between the two.

Canonical Vercel shapes

Vercel ships two variants in the 2026-04-21 post:

Flat, date-sorted — for blog posts where reverse- chronological ordering is the natural structure. Single-level markdown list:

# Blog sitemap

- [How we made global routing faster with Bloom filters](/blog/how-we-made-global-routing-faster-with-bloom-filters.md)
- [Inside Workflow DevKit: how framework integrations work](/blog/inside-workflow-devkit-how-framework-integrations-work.md)
- ...

Hierarchical, nested — for docs where parent-child grouping is load-bearing. Recursive renderer preserves nesting via two-space indent per level:

# Documentation sitemap

- [Getting Started](/docs/getting-started)
  - [Installation](/docs/getting-started/installation)
  - [First Deployment](/docs/getting-started/first-deployment)
- [Concepts](/docs/concepts)
  - [Projects](/docs/concepts/projects)
  - [Deployments](/docs/concepts/deployments)
...

The canonical recursive-renderer shape is five lines of TypeScript (see sources/2026-04-21-vercel-making-agent-friendly-pages-with-content-negotiation for full code). The pattern generalises to any static-site generator with a tree of pages: Docusaurus, VitePress, Next.js MDX, Astro, etc.

Why hierarchy matters for agents

An agent that fetches /docs/sitemap.md and sees

- [Databases](/docs/databases)
  - [Vercel Postgres](/docs/databases/vercel-postgres)
  - [Neon](/docs/databases/neon)
  - [Supabase](/docs/databases/supabase)

now knows that "Neon" is a child of "Databases" — useful context that flat XML sitemaps cannot carry. For queries like "what database options does this site document?", the agent can answer from the sitemap alone without fetching any individual page. The parent-child relationship is the semantic primitive that makes a sitemap more than a URL list.

Composition with concepts/markdown-content-negotiation

Markdown sitemaps and markdown content negotiation are complementary, not competing:

  • Markdown sitemap answers "what URLs exist?"
  • Markdown content negotiation answers "get me this URL as markdown"

An agent that lands on the markdown sitemap reads the list, picks candidate URLs by title-match, and fetches each with Accept: text/markdown for efficient consumption. The two primitives together give the agent cheap discovery + cheap retrieval — the workflow that llms.txt + /index.md also supports on the Cloudflare side, via different primitives.

Differences from alternatives

Primitive Format Hierarchy Titles Exhaustive Agent-parseable
XML sitemap XML No No Yes Machine-only
llms.txt markdown Yes Yes Curated LLM-friendly
Markdown sitemap markdown Yes Yes Yes LLM-friendly
robots.txt Sitemap: directive text No No Points at XML Machine-only

Markdown sitemaps + llms.txt are the two LLM-friendly primitives; the markdown sitemap trades llms.txt's curation for exhaustive coverage.

Failure modes and caveats

  • Context-window overflow on large sites. A site with 100,000 docs pages produces a sitemap too large for a single LLM read. Mitigation: split by section (per-product /docs/<product>/sitemap.md) — matches the pattern split llms.txt per subdirectory for llms.txt files.
  • Staleness without rebuild. Vercel's implementation uses export const dynamic = 'force-static'; — the sitemap is statically built, which means CMS content changes require a revalidate or rebuild to surface. For rapidly- changing content, switch to a dynamic route handler or short-TTL remote cache.
  • No lastmod metadata equivalent. XML sitemaps carry <lastmod> per URL; markdown sitemaps don't have a standard way to encode it. An agent that wants freshness metadata has to either fetch each page or fall back to the XML sitemap.
  • Path vs URL ambiguity in links. Markdown links in a sitemap may be relative (/blog/post.md) or absolute (https://vercel.com/blog/post.md). Agents typically handle both, but server-side middleware that rewrites paths should be consistent about which shape the sitemap uses.
  • No query-string / fragment semantics. A URL like /blog/post#section-2 is a page anchor; the sitemap typically only lists page-level URLs. Sub-page navigation requires the agent to fetch the page and parse it.

Discovery

How does an agent find the markdown sitemap? Three options:

  1. Advertise in llms.txt — the llms.txt file can contain a link to /blog/sitemap.md.
  2. link rel="alternate" in HTML <head> — the Vercel pattern (see patterns/link-rel-alternate-markdown-discovery).
  3. Convention — agents probe /sitemap.md, /<section>/sitemap.md, or /docs/sitemap.md by default when looking for agent-friendly indexes.

No formal standard has landed as of 2026-04; discovery is by convention and by advertisement through llms.txt or link rel tags.

Seen in

Last updated · 476 distilled / 1,218 read