CLOUDFLARE

Your site, your rules: new AI traffic options for all customers¶

Summary¶

Cloudflare announces a major rearchitecture of their bot classification system, moving from a binary "block AI bots" toggle to a behavior-based taxonomy that classifies all automated traffic by what it does on your site (Search, Agent, Training) and what it stores/reshares (immediate, reference, full). The post introduces BotBase — a new visibility plane cataloging all tracked bots — and proposes a transitive trust mechanism using the RFC 7239 Forwarded header to propagate operator identity through multi-layer intermediaries. New defaults (effective 2026-09-15) will block Training and Agent bots on ad-bearing pages, while the definition of "Verified" is decoupled from "default allowed" and tied to category-level allowlists instead.

Key Takeaways¶

Three-category AI traffic model: Cloudflare replaces the single "Block AI Bots" toggle with three orthogonal controls — Search (proactive indexing for later queries), Agent (real-time action on behalf of a human), and Training (permanent content absorption into model weights). These are available to all customers including the free tier.
Full bot behavior taxonomy (10 categories): Beyond AI, the complete classification includes: Search, Agent, Training, Transact, Data Collection, Security Testing, SEO, Ads Verification, Social/Link Preview, Feed Fetching, Monitoring & Operations. Each bot can have multiple behaviors.
Content-use signaling via robots.txt extension: Cloudflare proposes a use field extension to Content Signals with three levels: use=immediate (interact but store nothing), use=reference (index, excerpt, link back — the new default), use=full (summarize and reproduce). Bots caught abusing these signals lose Verified status.
New defaults effective 2026-09-15: On pages displaying ads, Training and Agent bots are blocked by default for new domains. Search remains allowed by default. Multi-purpose crawlers (Googlebot, Applebot, BingBot) that combine Search with Training are blocked under the most-restrictive-rule principle unless the site owner explicitly opts out.
BotBase as visibility system: Enterprise Bot Management customers get a dynamic, searchable database of all known bots/agents showing classification, behaviors, content-use level, and verification status. Detection IDs are copyable for use in custom Security rules.
Verified ≠ default-allowed: Previously all Verified bots were default-allowed. Now Verified means "allowable within its category" — you still need the category enabled. The Verified process is being opened up: operators must prove honest self-representation AND non-abusive behavior.
Transitive trust via RFC 7239 Forwarded header: For the pattern (site owner → bot-owning platform → end user), Cloudflare proposes using the existing RFC 7239 Forwarded header to propagate operator identity: Forwarded: for="openai";use="reference". Losing Verified status across >20% of web domains behind Cloudflare creates a deterrent with teeth.
Call for crawler separation: Cloudflare urges companies running multiple use-cases (Search + Agent + Training) to separate them into distinct crawlers for transparency. Multi-purpose crawlers will be subject to the most restrictive rule that applies to any of their behaviors.

Architectural Design Decisions¶

Behavior-based over identity-based classification: Rather than maintaining a binary AI/not-AI distinction, classify by observable behavior. This makes the taxonomy future-proof as "AI" boundaries shift.
Most-restrictive-rule enforcement: When a crawler declares multiple behaviors, the most restrictive customer rule wins. This creates an incentive for operators to split crawlers by purpose.
Ad pages as signal for human-monetization intent: The presence of ads is used as a heuristic proxy for "this page is meant for human attention," triggering stricter defaults.
Protocol-level trust propagation: Instead of building a new trust registry, leverage an existing RFC (7239) to carry identity through intermediaries — compatible with existing web infrastructure.

Operational Numbers¶

Cloudflare serves >20% of web domains — losing Verified status has significant reach-based deterrent value.
New defaults roll out to all new domains onboarding to Cloudflare as of September 15, 2026.
Existing customers can opt out before September 15 in Security settings.

Caveats¶

The content-use signaling is a preference, not enforcement — compliance depends on bot operators honoring robots.txt extensions.
Transitive trust via Forwarded header is still experimental and may not cover privacy-sensitive small-scale operators who cannot afford to be identifiable.
The post acknowledges that as bot traffic blends with human traffic, the trust framework won't fit the entire web for all time — private rate limiting is suggested as a complementary mechanism.

Source¶

systems/cloudflare-bot-management — underlying bot scoring system
systems/cloudflare-botbase — new visibility plane introduced in this post
concepts/bot-traffic-taxonomy — behavioral classification framework
concepts/content-use-signaling — robots.txt content-use extension
concepts/transitive-trust — identity propagation through intermediaries
patterns/behavior-based-bot-classification — classify by action not identity
patterns/content-use-robots-txt-extension — protocol-level content preference
patterns/forwarded-header-transitive-trust — RFC 7239 for trust chains