Your site, your rules: new AI traffic options for all customers¶
Summary¶
Cloudflare announces a major rearchitecture of their bot classification system, moving from a binary "block AI bots" toggle to a behavior-based taxonomy that classifies all automated traffic by what it does on your site (Search, Agent, Training) and what it stores/reshares (immediate, reference, full). The post introduces BotBase — a new visibility plane cataloging all tracked bots — and proposes a transitive trust mechanism using the RFC 7239 Forwarded header to propagate operator identity through multi-layer intermediaries. New defaults (effective 2026-09-15) will block Training and Agent bots on ad-bearing pages, while the definition of "Verified" is decoupled from "default allowed" and tied to category-level allowlists instead.
Key Takeaways¶
-
Three-category AI traffic model: Cloudflare replaces the single "Block AI Bots" toggle with three orthogonal controls — Search (proactive indexing for later queries), Agent (real-time action on behalf of a human), and Training (permanent content absorption into model weights). These are available to all customers including the free tier.
-
Full bot behavior taxonomy (10 categories): Beyond AI, the complete classification includes: Search, Agent, Training, Transact, Data Collection, Security Testing, SEO, Ads Verification, Social/Link Preview, Feed Fetching, Monitoring & Operations. Each bot can have multiple behaviors.
-
Content-use signaling via robots.txt extension: Cloudflare proposes a
usefield extension to Content Signals with three levels:use=immediate(interact but store nothing),use=reference(index, excerpt, link back — the new default),use=full(summarize and reproduce). Bots caught abusing these signals lose Verified status. -
New defaults effective 2026-09-15: On pages displaying ads, Training and Agent bots are blocked by default for new domains. Search remains allowed by default. Multi-purpose crawlers (Googlebot, Applebot, BingBot) that combine Search with Training are blocked under the most-restrictive-rule principle unless the site owner explicitly opts out.
-
BotBase as visibility system: Enterprise Bot Management customers get a dynamic, searchable database of all known bots/agents showing classification, behaviors, content-use level, and verification status. Detection IDs are copyable for use in custom Security rules.
-
Verified ≠ default-allowed: Previously all Verified bots were default-allowed. Now Verified means "allowable within its category" — you still need the category enabled. The Verified process is being opened up: operators must prove honest self-representation AND non-abusive behavior.
-
Transitive trust via RFC 7239 Forwarded header: For the pattern (site owner → bot-owning platform → end user), Cloudflare proposes using the existing RFC 7239
Forwardedheader to propagate operator identity:Forwarded: for="openai";use="reference". Losing Verified status across >20% of web domains behind Cloudflare creates a deterrent with teeth. -
Call for crawler separation: Cloudflare urges companies running multiple use-cases (Search + Agent + Training) to separate them into distinct crawlers for transparency. Multi-purpose crawlers will be subject to the most restrictive rule that applies to any of their behaviors.
Architectural Design Decisions¶
- Behavior-based over identity-based classification: Rather than maintaining a binary AI/not-AI distinction, classify by observable behavior. This makes the taxonomy future-proof as "AI" boundaries shift.
- Most-restrictive-rule enforcement: When a crawler declares multiple behaviors, the most restrictive customer rule wins. This creates an incentive for operators to split crawlers by purpose.
- Ad pages as signal for human-monetization intent: The presence of ads is used as a heuristic proxy for "this page is meant for human attention," triggering stricter defaults.
- Protocol-level trust propagation: Instead of building a new trust registry, leverage an existing RFC (7239) to carry identity through intermediaries — compatible with existing web infrastructure.
Operational Numbers¶
- Cloudflare serves >20% of web domains — losing Verified status has significant reach-based deterrent value.
- New defaults roll out to all new domains onboarding to Cloudflare as of September 15, 2026.
- Existing customers can opt out before September 15 in Security settings.
Caveats¶
- The content-use signaling is a preference, not enforcement — compliance depends on bot operators honoring
robots.txtextensions. - Transitive trust via
Forwardedheader is still experimental and may not cover privacy-sensitive small-scale operators who cannot afford to be identifiable. - The post acknowledges that as bot traffic blends with human traffic, the trust framework won't fit the entire web for all time — private rate limiting is suggested as a complementary mechanism.
Source¶
- Original: https://blog.cloudflare.com/content-independence-day-ai-options/
- Raw markdown:
raw/cloudflare/2026-07-01-your-site-your-rules-new-ai-traffic-options-for-all-customer-53c12c6d.md
Related¶
- systems/cloudflare-bot-management — underlying bot scoring system
- systems/cloudflare-botbase — new visibility plane introduced in this post
- concepts/bot-traffic-taxonomy — behavioral classification framework
- concepts/content-use-signaling — robots.txt content-use extension
- concepts/transitive-trust — identity propagation through intermediaries
- patterns/behavior-based-bot-classification — classify by action not identity
- patterns/content-use-robots-txt-extension — protocol-level content preference
- patterns/forwarded-header-transitive-trust — RFC 7239 for trust chains