PATTERN Cited by 1 source
Response status as content policy¶
Pattern¶
Use the HTTP response status-code surface as the protocol for per-client-class content policy. When an origin operator wants different policy for different client classes (AI training crawlers vs search engines vs humans vs paying licensees), the answer is not a bespoke signalling layer — it's pick the right status code per class, because that's "how the web communicates policy to crawlers" (Cloudflare, 2026-04-17).
From the 2026-04-17 Redirects for AI Training post:
"A
200 OKmeans content was served. A403 Forbiddenmeans access was blocked. A402 Payment Requiredtells the client it needs to pay for access. Taken together, the distribution of status codes across AI crawler traffic reveals how the web is actually responding to crawlers at scale."
The status-code policy surface¶
| Status | Policy semantics | Example |
|---|---|---|
200 OK |
Content served as-is | Default |
301 Moved Permanently |
Here's the canonical version | Redirects for AI Training — canonical-tag → 301 for AI Crawler category |
402 Payment Required |
License fee required | Pay Per Crawl — [concepts/http-402-payment-required |
403 Forbidden |
Blocked; no negotiation | WAF + bot-management hard-block |
404 Not Found |
Gone; nothing here | Standard HTTP semantics |
451 Unavailable For Legal Reasons |
Blocked for legal reasons | Rare but standardised |
Each status carries well-defined HTTP semantics; there is no need for a parallel vocabulary or bespoke headers.
Canonical instance¶
Cloudflare's 2026-04-17 arc operationalises three status codes as first-class content-policy choices:
301→ Redirects for AI Training — the 2026-04-17 launch feature. Routes AI training crawlers to the canonical URL. Declarative via<link rel="canonical">(patterns/canonical-tag-as-crawler-redirect).402→ Pay Per Crawl — previously launched. Returns [[concepts/http-402-payment-required|402 Payment Required]] to AI crawlers until the crawler pays. Re-uses an HTTP status code reserved since HTTP/1.1 but essentially dormant in production.403→ WAF / bot-management hard-block. The "just reject it" policy, distinct from the categorised-redirect and monetised-access variants above.
All three are surfaced through AI Crawl Control and observable on Radar's new "Response Status Code Analysis" AI-Insights surface.
Measurement / observability¶
Radar AI Insights' 2026-04-17-launched Response Status Code Analysis chart is the measurement complement: which status codes does the web return to AI crawlers in aggregate? Representative numbers from the launch post:
- Aggregate (top-5 AI crawlers): ~74 %
2xx, ~13.7 %4xx, ~11.3 %3xx, ~1.2 %5xx. - GPTBot-specific: ~83 %
2xx, ~10 %4xx, ~5.1 %3xx, ~2.2 %5xx.
Filterable by industry (via Data Explorer) and crawl
purpose (Training / Search / Assistant). The
"redirect-rate" slice of the 3xx bucket is a first-class
measurement primitive for how widespread adoption of
canonical-tag-based training-crawler redirection becomes over
time.
Why this framing is load-bearing¶
- Every HTTP intermediary already understands the status
codes. No new parser, library, or crawler-side change is
needed — the crawler already knows what
301,402,403,404mean. The policy signal rides on well-understood HTTP semantics. - Web-wide measurability. Status-code distributions can be counted at any vantage point (edge, origin, crawler). Cloudflare's Radar surface demonstrates the measurement leverage.
- Polyvalent surface. One status code can mean different
things in different policy lanes:
403from a WAF is a hard-block;403from a paywall means "you haven't paid." The status code is the substrate; the operational meaning is carried by which ruleset emitted it. - Composes with existing advisory layers.
robots.txt, Content Signals,noindexare all advisory — they declare intent. Status codes are enforcement — they execute intent. The pattern fills the gap between "here's what I want" and "here's what I will actually do."
Trade-offs¶
- Status code semantics are coarse. "Why a
402?" or "which license applies?" requires metadata the status line doesn't carry. Payment / license detail typically rides in response body + headers on top of the status code. - Misclassification bites harder on status codes than on
advisory signals. A bot misclassified as an AI crawler
and served a
301to canonical content is a user-visible regression; a bot misclassified and served a Content-Signal declaration it ignores is invisible. - Some policy answers don't have a status code match. "Ingest this for search indexing but not for model training" is expressible in Content Signals but not in any single status code — the categorical response primitive is per-request, not per-use-case. The status-code-as-policy pattern is about enforcement for the class, not usage-type licensing post-fetch.
- Operator discretion without transparency. Every status code carries policy choice that's not visible to the client. Cloudflare does not propose a transparency-disclosure layer ("which URLs were 301'd for which crawler class when?") that would let researchers audit the policy surface across operators.
- The measurement surface captures crawler experience, not
policy intent. Radar shows the web returned
~11.3 %3xxresponses to AI crawlers in aggregate — but that bucket mixes operator-intentional redirects (Redirects for AI Training), operator-agnostic redirects (httpsupgrades, trailing-slash normalisation), and unintended redirects (misconfigured vhosts). Interpretation requires operator context.
Generalises beyond Cloudflare¶
- Any origin / edge operator can implement the pattern — pick the right status code per class. The infrastructure exists in every reverse proxy and every application framework.
- Measurement vantage points: CDN operators (Cloudflare Radar) are one; individual origins with their own telemetry are another; crawler operators themselves (OpenAI, Anthropic, Google, Meta) could publish their own received-status distributions for transparency.
Seen in¶
- sources/2026-04-17-cloudflare-redirects-for-ai-training-enforces-canonical-content
— canonical wiki reference. Names the framing "status codes
are ultimately how the web communicates policy to
crawlers." and anchors three active enforcement lanes on
Cloudflare's edge (
301= Redirects for AI Training,402= Pay Per Crawl,403= WAF block) plus the new Radar measurement surface.
Related¶
- systems/ai-crawl-control — parent product surfacing the per-category status-code policies.
- systems/redirects-for-ai-training —
301-lane instance. - systems/pay-per-crawl —
402-lane instance. - concepts/http-402-payment-required — the dormant HTTP status code revived for pay-per-crawl.
- concepts/content-signals — advisory sibling layer; Content Signals declares intent, this pattern executes it.
- patterns/canonical-tag-as-crawler-redirect — specific
mechanism for the
301lane. - systems/cloudflare-radar — measurement surface (Response Status Code Analysis on AI Insights).