PATTERN Cited by 1 source

Response status as content policy¶

Pattern¶

Use the HTTP response status-code surface as the protocol for per-client-class content policy. When an origin operator wants different policy for different client classes (AI training crawlers vs search engines vs humans vs paying licensees), the answer is not a bespoke signalling layer — it's pick the right status code per class, because that's "how the web communicates policy to crawlers" (Cloudflare, 2026-04-17).

From the 2026-04-17 Redirects for AI Training post:

"A 200 OK means content was served. A 403 Forbidden means access was blocked. A 402 Payment Required tells the client it needs to pay for access. Taken together, the distribution of status codes across AI crawler traffic reveals how the web is actually responding to crawlers at scale."

The status-code policy surface¶

Status	Policy semantics	Example
`200 OK`	Content served as-is	Default
`301 Moved Permanently`	Here's the canonical version	Redirects for AI Training — canonical-tag → `301` for AI Crawler category
`402 Payment Required`	License fee required	Pay Per Crawl — [concepts/http-402-payment-required
`403 Forbidden`	Blocked; no negotiation	WAF + bot-management hard-block
`404 Not Found`	Gone; nothing here	Standard HTTP semantics
`451 Unavailable For Legal Reasons`	Blocked for legal reasons	Rare but standardised

Each status carries well-defined HTTP semantics; there is no need for a parallel vocabulary or bespoke headers.

Canonical instance¶

Cloudflare's 2026-04-17 arc operationalises three status codes as first-class content-policy choices:

301 → Redirects for AI Training — the 2026-04-17 launch feature. Routes AI training crawlers to the canonical URL. Declarative via <link rel="canonical"> (patterns/canonical-tag-as-crawler-redirect).
402 → Pay Per Crawl — previously launched. Returns [[concepts/http-402-payment-required|402 Payment Required]] to AI crawlers until the crawler pays. Re-uses an HTTP status code reserved since HTTP/1.1 but essentially dormant in production.
403 → WAF / bot-management hard-block. The "just reject it" policy, distinct from the categorised-redirect and monetised-access variants above.

All three are surfaced through AI Crawl Control and observable on Radar's new "Response Status Code Analysis" AI-Insights surface.

Measurement / observability¶

Radar AI Insights' 2026-04-17-launched Response Status Code Analysis chart is the measurement complement: which status codes does the web return to AI crawlers in aggregate? Representative numbers from the launch post:

Aggregate (top-5 AI crawlers): ~74 % 2xx, ~13.7 % 4xx, ~11.3 % 3xx, ~1.2 % 5xx.
GPTBot-specific: ~83 % 2xx, ~10 % 4xx, ~5.1 % 3xx, ~2.2 % 5xx.

Filterable by industry (via Data Explorer) and crawl purpose (Training / Search / Assistant). The "redirect-rate" slice of the 3xx bucket is a first-class measurement primitive for how widespread adoption of canonical-tag-based training-crawler redirection becomes over time.

Why this framing is load-bearing¶

Every HTTP intermediary already understands the status codes. No new parser, library, or crawler-side change is needed — the crawler already knows what 301, 402, 403, 404 mean. The policy signal rides on well-understood HTTP semantics.
Web-wide measurability. Status-code distributions can be counted at any vantage point (edge, origin, crawler). Cloudflare's Radar surface demonstrates the measurement leverage.
Polyvalent surface. One status code can mean different things in different policy lanes: 403 from a WAF is a hard-block; 403 from a paywall means "you haven't paid." The status code is the substrate; the operational meaning is carried by which ruleset emitted it.
Composes with existing advisory layers. robots.txt, Content Signals, noindex are all advisory — they declare intent. Status codes are enforcement — they execute intent. The pattern fills the gap between "here's what I want" and "here's what I will actually do."

Trade-offs¶

Status code semantics are coarse. "Why a 402?" or "which license applies?" requires metadata the status line doesn't carry. Payment / license detail typically rides in response body + headers on top of the status code.
Misclassification bites harder on status codes than on advisory signals. A bot misclassified as an AI crawler and served a 301 to canonical content is a user-visible regression; a bot misclassified and served a Content-Signal declaration it ignores is invisible.
Some policy answers don't have a status code match. "Ingest this for search indexing but not for model training" is expressible in Content Signals but not in any single status code — the categorical response primitive is per-request, not per-use-case. The status-code-as-policy pattern is about enforcement for the class, not usage-type licensing post-fetch.
Operator discretion without transparency. Every status code carries policy choice that's not visible to the client. Cloudflare does not propose a transparency-disclosure layer ("which URLs were 301'd for which crawler class when?") that would let researchers audit the policy surface across operators.
The measurement surface captures crawler experience, not policy intent. Radar shows the web returned ~11.3 % 3xx responses to AI crawlers in aggregate — but that bucket mixes operator-intentional redirects (Redirects for AI Training), operator-agnostic redirects (https upgrades, trailing-slash normalisation), and unintended redirects (misconfigured vhosts). Interpretation requires operator context.

Generalises beyond Cloudflare¶

Any origin / edge operator can implement the pattern — pick the right status code per class. The infrastructure exists in every reverse proxy and every application framework.
Measurement vantage points: CDN operators (Cloudflare Radar) are one; individual origins with their own telemetry are another; crawler operators themselves (OpenAI, Anthropic, Google, Meta) could publish their own received-status distributions for transparency.

Seen in¶

sources/2026-04-17-cloudflare-redirects-for-ai-training-enforces-canonical-content — canonical wiki reference. Names the framing "status codes are ultimately how the web communicates policy to crawlers." and anchors three active enforcement lanes on Cloudflare's edge (301 = Redirects for AI Training, 402 = Pay Per Crawl, 403 = WAF block) plus the new Radar measurement surface.

systems/ai-crawl-control — parent product surfacing the per-category status-code policies.
systems/redirects-for-ai-training — 301-lane instance.
systems/pay-per-crawl — 402-lane instance.
concepts/http-402-payment-required — the dormant HTTP status code revived for pay-per-crawl.
concepts/content-signals — advisory sibling layer; Content Signals declares intent, this pattern executes it.
patterns/canonical-tag-as-crawler-redirect — specific mechanism for the 301 lane.
systems/cloudflare-radar — measurement surface (Response Status Code Analysis on AI Insights).