SYSTEM Cited by 2 sources
Redirects for AI Training¶
Overview¶
Redirects for AI Training
(developers.cloudflare.com/ai-crawl-control/reference/redirects-for-ai-training)
is a Cloudflare feature inside AI
Crawl Control that detects verified AI-training crawlers
and issues an HTTP 301 Moved Permanently to the page's
canonical URL (per the page's own
<link rel="canonical"> tag, RFC 6596)
before the body is served — so LLM training datasets get
Cloudflare's freshest implementation details rather than the
deprecated original.
Launched 2026-04-17 at blog.cloudflare.com/ai-redirects,
as a one-toggle feature on all paid Cloudflare plans. First
dogfooded on developers.cloudflare.com
— deprecated Wrangler v1 and Workers Sites paths now return
301 to the current v2 / equivalent paths for verified AI
crawlers.
Problem it solves¶
Deprecated documentation poses a specific LLM-training data problem:
- A human reading the page sees a large "this is deprecated, use v2" banner at the top and cognitively discounts the below-banner prose accordingly.
- An LLM training crawler ingests the prose without the cross-page cognitive discount — so the training set teaches the model "Wrangler v1 commands are the answer" when asked about Wrangler.
- Advisory signals don't stop them. Cloudflare's own docs
for legacy Wrangler v1 carried the deprecation banner, the
noindexmeta tag (concepts/noindex-meta-tag), and canonical tags pointing at v2. AI Crawl Control telemetry showed the AI Crawler category visited 4.8 M times in 30 days and "consumed deprecated content at the same rate as current content. The advisory signals made no measurable difference." (Source: sources/2026-04-17-cloudflare-redirects-for-ai-training-enforces-canonical-content) - Ground truth: in April 2026 a leading AI assistant
answered "How do I write KV values using the Wrangler
CLI?" with the deprecated
wrangler kv:key putsyntax — the colon form was deprecated in Wrangler 3.60.0; the current syntax iswrangler kv key put.
Result: LLMs recommend outdated tooling; users act on outdated recommendations; Cloudflare customers hit regressions discovering their agent-generated code uses deprecated APIs.
Mechanism¶
Three inputs + one decision:
cf.verified_bot_category— Cloudflare's Ruleset-Engine field that categorises incoming bot requests. Redirects for AI Training keys on AI Crawler (docs) — GPTBot, ClaudeBot, Bytespider, etc. Distinct from AI Assistant (human-directed agents) and AI Search (AI-powered search indexers), which are not redirected.<link rel="canonical" href="...">tag in the origin's response HTML. Cloudflare reads the response body to locate the tag.- Exclusion filters. Self-referencing canonicals (infinite-loop guard) and cross-origin canonicals (scope discipline — "often used for domain consolidation rather than content freshness") don't trigger redirects.
If the request is an AI Crawler, the canonical tag is present, non-self-referencing, and same-origin, Cloudflare returns:
before the response body is served. Example exchange from the launch post:
GET /durable-objects/api/legacy-kv-storage-api/
Host: developers.cloudflare.com
User-Agent: Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)
HTTP/1.1 301 Moved Permanently
Location: https://developers.cloudflare.com/durable-objects/api/sqlite-storage-api/
Human traffic, search indexing, AI Assistant, and AI Search all pass through unchanged.
7-day dogfood result¶
Cloudflare enabled Redirects for AI Training on developers.cloudflare.com. After 7 days:
"100 % of AI training crawler requests to pages with non-self-referencing canonical tags were redirected and were not served with deprecated content."
The post is careful to separate the measured redirect rate (100 %, deterministic) from the hypothesised AI-answer-quality lift ("given the closed nature of training pipelines and variability in recrawl timing, this is a hypothesis we will continue to verify"). What the crawler receives at the point of access has improved immediately; what future models say is a function of each training pipeline's recrawl cadence.
Why the intervention is leveraged at training time¶
The alternative is inference-time correction: patch every agent's prompt to "always prefer Wrangler v2"; filter RAG results; or retrain the LLM. All are expensive, partial, and don't address offline agent use or future models.
Redirecting training-time crawlers fixes the source of truth for future models. Once the next training run happens, the model naturally prefers v2 — no prompt engineering needed. High leverage: one Cloudflare-side config change → every future LLM model's answer quality improves for Cloudflare-related topics.
Why not single redirect rules?¶
Cloudflare's 2026-04-17 post explicitly addresses Single Redirect Rules as the alternative:
- Per-path rules don't scale — every new deprecated path requires a rule change.
- User-agents must be manually tracked; crawler user-agent lists drift.
- Plan-limit rule quota is finite and competes with campaign URLs, domain migrations.
- "Manually re-encode[s] what canonical tags already declare and fall[s] out of sync as content changes."
The canonical-tag substrate couples crawler-redirect policy to the same infrastructure that already keeps SEO and RAG-retrieval-in-HTML in sync — one source of truth, not two. See patterns/canonical-tag-as-crawler-redirect.
Ubiquity of the substrate¶
The 2025 Web Almanac SEO chapter
reports 65-69 % of web pages carry <link rel="canonical">
tags. Platforms like
EmDash,
WordPress, and Contentful emit them automatically. Most
Cloudflare customers therefore get the behaviour for free once
the toggle is flipped — no new authoring step is required.
Dashboard path¶
"In the dashboard: on any domain, go to *AI Crawl Control
Quick Actions > Redirects for AI training > toggle on*."
Path-specific control requires Configuration Rules and, for multi-tenant deployments, Cloudflare for SaaS.
Radar surface¶
Aggregate adoption + its cousin measurements surface on
Cloudflare Radar's new
Response Status Code Analysis
AI-Insights graph — filterable by industry and crawl-purpose,
with per-bot detail on the bot directory
pages. Representative aggregate: ~11.3 % 3xx responses
to AI crawlers (redirects in total); representative per-GPTBot:
~5.1 % 3xx. Over time, the 3xx bucket is expected to
grow as more origins enable Redirects for AI Training.
Trade-offs¶
- Different content for different classes. Origin unilaterally decides; the same mechanism used maliciously could hide unfavorable content from training-crawler classes. Trust depends on the operator; no standard transparency disclosure exists.
- Doesn't fix already-ingested training data. Models trained before the toggle was enabled still have the deprecated content baked in; this intervention only affects future crawl ingests.
- Unverified crawlers bypass. Only requests classified
as
cf.verified_bot_category == AI Crawlerare redirected. Adversarial or unclassified scrapers are unaffected. - No-ops on 31-35 % of pages. Pages without canonical tags get no redirect — the feature piggybacks on existing SEO infrastructure.
- Self-referencing canonicals are common defaults. Many CMSes emit self-referencing canonicals; deprecated pages must actively set a different canonical URL for the redirect to fire.
- Cross-origin canonicals excluded by design. Pages with canonicals pointing at other domains are not redirected.
- Depends on crawlers honouring
301. Modern HTTP clients do; adversarial scrapers may not.
Relationship to other Cloudflare content-routing primitives¶
- AI Crawl Control — the parent product inside which this feature ships. Telemetry for it lives in AI Crawl Control's dashboard.
- Pay Per Crawl — same edge
classification layer, different status code (
402), different policy intent (monetise instead of redirect). - WAF + bot-management rules
— classical hard-block layer (
403); Redirects for AI Training runs alongside / below. - Content Signals —
publisher-side preference declaration
(
ai-train=no,ai-input=yes, etc.); Redirects for AI Training is an enforcement mechanism a site can pair with the declaration. Content Signals is advisory; this is enforcement. - patterns/response-status-as-content-policy — the
broader framing:
301for canonical-routing,402for pay-per-crawl,403for block — HTTP status codes as the policy-enforcement substrate.
Seen in¶
- sources/2026-04-17-cloudflare-redirects-for-ai-training-enforces-canonical-content
— canonical primary-source (2026-04-17 launch post).
Supplies the mechanism (
cf.verified_bot_category+ HTML canonical-tag parsing →301), the exclusion filters (self-referencing, cross-origin), the 7-day 100 % redirect-rate dogfood on developers.cloudflare.com, thewrangler kv:key putground-truth inference failure, the why-not-redirect-rules analysis, and the per-category verified-bot distinction (AI Crawler vs AI Assistant vs AI Search). - sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready
— companion Agents-Week post that first mentioned the
feature as one dogfood mechanism on developers.cloudflare.com
(deprecated Wrangler v1 + Workers Sites docs) alongside
the split
llms.txt, dynamic/index.md, and hidden agent directives — without launch-post mechanism detail.
Related¶
- companies/cloudflare — parent company.
- systems/ai-crawl-control — parent product.
- concepts/agent-training-crawler-redirect — concept.
- concepts/canonical-tag — the HTML primitive the feature binds to.
- concepts/noindex-meta-tag — adjacent advisory primitive shown to be insufficient for AI training crawlers.
- patterns/agent-training-crawler-redirect — parent pattern (redirect-training-crawlers in general).
- patterns/canonical-tag-as-crawler-redirect — the declarative-via-HTML-canonical-tag specific variant.
- patterns/response-status-as-content-policy — broader HTTP-status-code-as-policy framing.
- systems/web-bot-auth — cooperating-crawler classification substrate.
- systems/cloudflare-developer-documentation — reference deployment.
- systems/cloudflare-radar — Response Status Code
Analysis surface measuring aggregate + per-bot
3xx(redirect) rates across AI-crawler traffic.