Skip to content

SYSTEM Cited by 1 source

AI Crawl Control

Overview

AI Crawl Control (developers.cloudflare.com/ai-crawl-control) is a Cloudflare dashboard + ruleset surface that gives origin operators per-category policy over AI crawler traffic to their sites. It sits atop Cloudflare's verified-bot classification and exposes each crawler class through the cf.verified_bot_category Ruleset-Engine field — so declarative policy for "what do I do with AI training crawlers?" attaches to the same substrate as the rest of Cloudflare's bot-management stack.

Three verified-bot categories it distinguishes

The post draws an explicit, operationally-load-bearing distinction between three categories that are otherwise easy to conflate:

  • AI Crawler — bots that crawl for AI model training. Named instances: GPTBot, ClaudeBot, Bytespider. Observed on developers.cloudflare.com at ~4.8 M visits / 30 days (2026-04 telemetry). This is the category Redirects for AI Training keys on.
  • AI Assistant — bots acting on behalf of human users during an interactive session (fetching linked URLs, loading pages to summarise). Separate category; not redirected by Redirects for AI Training.
  • AI Search — bots indexing content for AI-powered search products. Also separate; not redirected.

The per-category split is the reason the feature can "redirect AI-training-crawler requests to current content while AI Agents visiting deprecated pages will not be redirected" — AI Agents and AI Search fall under different verified-bot categories.

Features surfaced inside AI Crawl Control

  • Telemetry and reporting — first-party visibility into which AI crawlers are hitting a zone, what they're fetching, and how often. Load-bearing for the "advisory signals made no measurable difference" finding on developers.cloudflare.com: the noindex tag was present, the canonical tags were present, the deprecation banners were present, and AI Crawl Control telemetry showed crawlers were hitting deprecated content at the same rate as current content anyway.
  • Redirects for AI Training — Cloudflare's 2026-04-17 launch feature. Canonical-tag → HTTP 301 translation for verified AI crawlers. One toggle, all paid plans.
  • Pay Per Crawl — adjacent category-aware policy: return HTTP 402 Payment Required to the AI Crawler category until the publisher is paid.
  • Quick Actions — the specific UI path for the toggle is "AI Crawl Control > Quick Actions > Redirects for AI training".

Relationship to broader Cloudflare stack

AI Crawl Control is one of three tiers of AI-crawler control:

  1. WAF + bot-management rules — hard block (403). Adversarial / unverified crawlers. Named in the 2026-04-17 post as the enforcement tier paired with Content Signals ai-train=no declarations.
  2. AI Crawl Control (this system) — category-aware soft policy (301 to canonical, 402 for pay-per-crawl, logging). Verified crawlers. Declarative policy that doesn't require per-path rule maintenance.
  3. robots.txt + Content Signalsadvisory declarations (yes-use / no-use per AI-use dimension). No enforcement; AI Crawl Control is how a site actually enforces the Content Signals intent for crawlers that pass through Cloudflare.

The "the web communicates policy via status codes" framing

The 2026-04-17 post explicitly ties AI Crawl Control's value to a broader narrative: "status codes are ultimately how the web communicates policy to crawlers." AI Crawl Control is the product that gives origins a first-class way to pick the right status code per crawler category:

  • 200 OK → default, content served.
  • 301 Moved Permanently → Redirects for AI Training; here's the canonical version.
  • 402 Payment Required → pay-per-crawl; content is behind a license fee.
  • 403 Forbidden → blocked.
  • 404 Not Found → gone.

See patterns/response-status-as-content-policy.

Seen in

Last updated · 200 distilled / 1,178 read