Skip to content

CLOUDFLARE

Read original ↗

Your site, your rules: new AI traffic options for all customers

Summary

Cloudflare announces a major rearchitecture of their bot classification system, moving from a binary "block AI bots" toggle to a behavior-based taxonomy that classifies all automated traffic by what it does on your site (Search, Agent, Training) and what it stores/reshares (immediate, reference, full). The post introduces BotBase — a new visibility plane cataloging all tracked bots — and proposes a transitive trust mechanism using the RFC 7239 Forwarded header to propagate operator identity through multi-layer intermediaries. New defaults (effective 2026-09-15) will block Training and Agent bots on ad-bearing pages, while the definition of "Verified" is decoupled from "default allowed" and tied to category-level allowlists instead.

Key Takeaways

  1. Three-category AI traffic model: Cloudflare replaces the single "Block AI Bots" toggle with three orthogonal controls — Search (proactive indexing for later queries), Agent (real-time action on behalf of a human), and Training (permanent content absorption into model weights). These are available to all customers including the free tier.

  2. Full bot behavior taxonomy (10 categories): Beyond AI, the complete classification includes: Search, Agent, Training, Transact, Data Collection, Security Testing, SEO, Ads Verification, Social/Link Preview, Feed Fetching, Monitoring & Operations. Each bot can have multiple behaviors.

  3. Content-use signaling via robots.txt extension: Cloudflare proposes a use field extension to Content Signals with three levels: use=immediate (interact but store nothing), use=reference (index, excerpt, link back — the new default), use=full (summarize and reproduce). Bots caught abusing these signals lose Verified status.

  4. New defaults effective 2026-09-15: On pages displaying ads, Training and Agent bots are blocked by default for new domains. Search remains allowed by default. Multi-purpose crawlers (Googlebot, Applebot, BingBot) that combine Search with Training are blocked under the most-restrictive-rule principle unless the site owner explicitly opts out.

  5. BotBase as visibility system: Enterprise Bot Management customers get a dynamic, searchable database of all known bots/agents showing classification, behaviors, content-use level, and verification status. Detection IDs are copyable for use in custom Security rules.

  6. Verified ≠ default-allowed: Previously all Verified bots were default-allowed. Now Verified means "allowable within its category" — you still need the category enabled. The Verified process is being opened up: operators must prove honest self-representation AND non-abusive behavior.

  7. Transitive trust via RFC 7239 Forwarded header: For the pattern (site owner → bot-owning platform → end user), Cloudflare proposes using the existing RFC 7239 Forwarded header to propagate operator identity: Forwarded: for="openai";use="reference". Losing Verified status across >20% of web domains behind Cloudflare creates a deterrent with teeth.

  8. Call for crawler separation: Cloudflare urges companies running multiple use-cases (Search + Agent + Training) to separate them into distinct crawlers for transparency. Multi-purpose crawlers will be subject to the most restrictive rule that applies to any of their behaviors.

Architectural Design Decisions

  • Behavior-based over identity-based classification: Rather than maintaining a binary AI/not-AI distinction, classify by observable behavior. This makes the taxonomy future-proof as "AI" boundaries shift.
  • Most-restrictive-rule enforcement: When a crawler declares multiple behaviors, the most restrictive customer rule wins. This creates an incentive for operators to split crawlers by purpose.
  • Ad pages as signal for human-monetization intent: The presence of ads is used as a heuristic proxy for "this page is meant for human attention," triggering stricter defaults.
  • Protocol-level trust propagation: Instead of building a new trust registry, leverage an existing RFC (7239) to carry identity through intermediaries — compatible with existing web infrastructure.

Operational Numbers

  • Cloudflare serves >20% of web domains — losing Verified status has significant reach-based deterrent value.
  • New defaults roll out to all new domains onboarding to Cloudflare as of September 15, 2026.
  • Existing customers can opt out before September 15 in Security settings.

Caveats

  • The content-use signaling is a preference, not enforcement — compliance depends on bot operators honoring robots.txt extensions.
  • Transitive trust via Forwarded header is still experimental and may not cover privacy-sensitive small-scale operators who cannot afford to be identifiable.
  • The post acknowledges that as bot traffic blends with human traffic, the trust framework won't fit the entire web for all time — private rate limiting is suggested as a complementary mechanism.

Source

Last updated · 564 distilled / 1,671 read