CONCEPT Cited by 2 sources

Canonical tag¶

Definition¶

The canonical tag is an HTML element — <link rel="canonical" href="..."> — defined in RFC 6596 (The Canonical Link Relation, 2012) that tells search engines and automated systems which URL represents the authoritative version of a page. It was introduced to resolve duplicate content: multiple URLs rendering the same or near-identical page (print view, tracking-parameter URLs, paginated listings, http-vs-https, mobile-vs-desktop host splits) all declare one canonical URL, and the search-engine index is kept coherent.

Simple example placed in <head>:

<link rel="canonical" href="https://example.com/docs/current">

The tag is declarative — it does not rewrite the URL the client sees; it only asserts the canonical URL. A conforming search engine treats the canonical URL as the representative one when deciding which result to surface.

Ubiquity¶

Per the 2025 Web Almanac, canonical tags are present on 65-69 % of web pages. Platforms like EmDash, WordPress, and Contentful emit them automatically — they are near-universal infrastructure for SEO-aware CMS output.

Three canonical-tag shapes¶

Self-referencing canonical — the canonical URL equals the page's own URL. Common default; essentially saying "this URL is my canonical URL." Used by most CMSes as a no-op-but-explicit declaration.
Same-origin non-self-referencing canonical — the canonical URL is a different URL on the same origin. Typical when a page is deprecated and points at its successor, or when multiple URLs serve the same content (e.g. /products?utm_source=email canonicalises to /products).
Cross-origin canonical — the canonical URL is on a different origin. Used for domain consolidations, syndication, or mirror sites.

Agent-era reuse¶

Cloudflare's Redirects for AI Training (2026-04-17) is the canonical wiki instance of using the tag for agent-era content-policy enforcement rather than its original SEO purpose. Mechanism:

On each AI-training-crawler request, Cloudflare reads the origin response HTML.
If a <link rel="canonical"> tag exists, is same-origin, and is non-self-referencing, Cloudflare returns HTTP 301 Moved Permanently to the canonical URL instead of serving the original body.
All other clients (humans, search engines, AI Agents, AI Search) pass through untouched.

The declarative substrate means origins get the behaviour with zero new configuration — their existing SEO infrastructure already emitted the signal. 65-69 % of web pages are pre-wired for this.

Self-referencing and cross-origin canonicals are excluded by design — the former would create infinite redirect loops, the latter is typically used for domain consolidation rather than content freshness.

Why it works better than `robots.txt` / `noindex` for¶

deprecated content¶

robots.txt blocks crawling but doesn't say "here's the up-to-date version instead" — the bot gets a void. noindex similarly signals "don't index" without direction. The canonical tag is the only widely-deployed HTML primitive that points at the right answer, which is exactly what an AI training crawler needs to be redirected to a fresher page.

Caveats¶

31-35 % of web pages have no canonical tag. On those pages, Redirects for AI Training (and any other canonical-tag-based mechanism) is a no-op.
Self-referencing canonicals are common defaults. Redirects for AI Training won't trigger on them by design; sites that want the behaviour need different canonical URLs pointing at the real current version.
The tag is advisory, not enforcement. On its own, it's just a hint. The enforcement step (edge server translating the tag to a 301) is done by Cloudflare — without an edge-proxy layer, the advisory tag does not cause a redirect.
SEO semantics may not perfectly match agent-redirect semantics. Sites that historically used the canonical tag for domain consolidation rather than content freshness may find Redirects for AI Training's behaviour on those pages unexpected — this is partly why cross-origin canonicals are excluded from the mechanism.

Seen in¶

sources/2026-04-17-cloudflare-redirects-for-ai-training-enforces-canonical-content — canonical wiki instance of agent-era reuse. The post is the launch of Cloudflare's Redirects for AI Training, which binds the HTTP-301-redirect mechanism directly to the canonical tag.
sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — earlier 2026-04-17 post that mentioned the canonical-tag-based redirect as one dogfood mechanism on developers.cloudflare.com without full launch-post mechanism details.

systems/redirects-for-ai-training — the feature that operationalises canonical tags as crawler redirects.
systems/ai-crawl-control — parent product that gates the feature.
patterns/canonical-tag-as-crawler-redirect — the pattern generalised beyond Cloudflare.
concepts/robots-txt — adjacent crawler-policy primitive (crawl-rule authoring); canonical tags are a per-page authoritative-URL primitive.
concepts/noindex-meta-tag — adjacent crawler-policy primitive (index-exclusion); canonical tags are the positive-direction primitive.