CONCEPT Cited by 1 source
Bot traffic taxonomy¶
A bot traffic taxonomy is a classification framework that categorizes automated web traffic by observable behavior — what a bot does on a site — rather than by identity or declared purpose alone. The approach acknowledges that "AI" is no longer a useful binary label as AI capabilities permeate search, agents, and training simultaneously.
Definition¶
A behavior-based taxonomy assigns each bot one or more behavioral classifications from a fixed vocabulary, enabling site owners to make allow/block decisions at the category level rather than per-bot. The key insight is that a single entity (e.g., Googlebot) may simultaneously serve multiple purposes (Search + Training), requiring multi-label classification.
Cloudflare's 10-Category Taxonomy (2026)¶
| Classification | Behavior |
|---|---|
| Search | Proactively builds index of site content to respond to future queries |
| Agent | Acts in real-time on behalf of a human to complete a task |
| Training | Crawls content to train or fine-tune models permanently |
| Transact | Checkout/payment actions on behalf of users |
| Data Collection | Price scraping, competitive intelligence, analytics |
| Security Testing | Vulnerability scanning, penetration testing |
| SEO | Site auditing, accessibility checks |
| Ads Verification | Ad placement verification, fraud detection |
| Social/Link Preview | Link previews for social platforms and messaging |
| Feed Fetching | RSS readers, podcast aggregators, news feeds |
| Monitoring & Operations | Uptime monitoring, webhooks, health checks |
The first three (bold) are the customer-configurable AI traffic controls available to all tiers.
Design Principles¶
- Multi-label: A bot can hold multiple classifications simultaneously. Multi-purpose crawlers are enforced against the most restrictive applicable rule.
- Behavior over identity: What a bot does matters more than who runs it. The same operator with different behaviors should use separate crawlers.
- Future-proof framing: Avoids defining "AI" as a static category — instead asks "what are they storing?" and "how will they reshare?"
- Incentive-compatible: The most-restrictive-rule enforcement creates a natural pressure for operators to separate crawlers by purpose.
Trade-offs¶
- Multi-label classification with most-restrictive enforcement is conservative — it may over-block legitimate mixed-purpose crawlers.
- The taxonomy assumes observable behavioral signals exist to distinguish categories; operators who obfuscate their purpose undermine the model.
- Category boundaries will shift as technology evolves (e.g., the line between Agent and Search blurs with AI answer engines).
Seen In¶
- sources/2026-07-01-cloudflare-ai-traffic-options — canonical definition and rationale
- systems/cloudflare-botbase — stores per-bot classifications
- systems/cloudflare-bot-management — enforces category-level rules