Skip to content

CONCEPT Cited by 2 sources

Canonical tag

Definition

The canonical tag is an HTML element — <link rel="canonical" href="..."> — defined in RFC 6596 (The Canonical Link Relation, 2012) that tells search engines and automated systems which URL represents the authoritative version of a page. It was introduced to resolve duplicate content: multiple URLs rendering the same or near-identical page (print view, tracking-parameter URLs, paginated listings, http-vs-https, mobile-vs-desktop host splits) all declare one canonical URL, and the search-engine index is kept coherent.

Simple example placed in <head>:

<link rel="canonical" href="https://example.com/docs/current">

The tag is declarative — it does not rewrite the URL the client sees; it only asserts the canonical URL. A conforming search engine treats the canonical URL as the representative one when deciding which result to surface.

Ubiquity

Per the 2025 Web Almanac, canonical tags are present on 65-69 % of web pages. Platforms like EmDash, WordPress, and Contentful emit them automatically — they are near-universal infrastructure for SEO-aware CMS output.

Three canonical-tag shapes

  1. Self-referencing canonical — the canonical URL equals the page's own URL. Common default; essentially saying "this URL is my canonical URL." Used by most CMSes as a no-op-but-explicit declaration.
  2. Same-origin non-self-referencing canonical — the canonical URL is a different URL on the same origin. Typical when a page is deprecated and points at its successor, or when multiple URLs serve the same content (e.g. /products?utm_source=email canonicalises to /products).
  3. Cross-origin canonical — the canonical URL is on a different origin. Used for domain consolidations, syndication, or mirror sites.

Agent-era reuse

Cloudflare's Redirects for AI Training (2026-04-17) is the canonical wiki instance of using the tag for agent-era content-policy enforcement rather than its original SEO purpose. Mechanism:

  1. On each AI-training-crawler request, Cloudflare reads the origin response HTML.
  2. If a <link rel="canonical"> tag exists, is same-origin, and is non-self-referencing, Cloudflare returns HTTP 301 Moved Permanently to the canonical URL instead of serving the original body.
  3. All other clients (humans, search engines, AI Agents, AI Search) pass through untouched.

The declarative substrate means origins get the behaviour with zero new configuration — their existing SEO infrastructure already emitted the signal. 65-69 % of web pages are pre-wired for this.

Self-referencing and cross-origin canonicals are excluded by design — the former would create infinite redirect loops, the latter is typically used for domain consolidation rather than content freshness.

Why it works better than robots.txt / noindex for

deprecated content

robots.txt blocks crawling but doesn't say "here's the up-to-date version instead" — the bot gets a void. noindex similarly signals "don't index" without direction. The canonical tag is the only widely-deployed HTML primitive that points at the right answer, which is exactly what an AI training crawler needs to be redirected to a fresher page.

Caveats

  • 31-35 % of web pages have no canonical tag. On those pages, Redirects for AI Training (and any other canonical-tag-based mechanism) is a no-op.
  • Self-referencing canonicals are common defaults. Redirects for AI Training won't trigger on them by design; sites that want the behaviour need different canonical URLs pointing at the real current version.
  • The tag is advisory, not enforcement. On its own, it's just a hint. The enforcement step (edge server translating the tag to a 301) is done by Cloudflare — without an edge-proxy layer, the advisory tag does not cause a redirect.
  • SEO semantics may not perfectly match agent-redirect semantics. Sites that historically used the canonical tag for domain consolidation rather than content freshness may find Redirects for AI Training's behaviour on those pages unexpected — this is partly why cross-origin canonicals are excluded from the mechanism.

Seen in

Last updated · 200 distilled / 1,178 read