CONCEPT Cited by 1 source

Bot traffic taxonomy¶

A bot traffic taxonomy is a classification framework that categorizes automated web traffic by observable behavior — what a bot does on a site — rather than by identity or declared purpose alone. The approach acknowledges that "AI" is no longer a useful binary label as AI capabilities permeate search, agents, and training simultaneously.

Definition¶

A behavior-based taxonomy assigns each bot one or more behavioral classifications from a fixed vocabulary, enabling site owners to make allow/block decisions at the category level rather than per-bot. The key insight is that a single entity (e.g., Googlebot) may simultaneously serve multiple purposes (Search + Training), requiring multi-label classification.

Cloudflare's 10-Category Taxonomy (2026)¶

Classification	Behavior
Search	Proactively builds index of site content to respond to future queries
Agent	Acts in real-time on behalf of a human to complete a task
Training	Crawls content to train or fine-tune models permanently
Transact	Checkout/payment actions on behalf of users
Data Collection	Price scraping, competitive intelligence, analytics
Security Testing	Vulnerability scanning, penetration testing
SEO	Site auditing, accessibility checks
Ads Verification	Ad placement verification, fraud detection
Social/Link Preview	Link previews for social platforms and messaging
Feed Fetching	RSS readers, podcast aggregators, news feeds
Monitoring & Operations	Uptime monitoring, webhooks, health checks

The first three (bold) are the customer-configurable AI traffic controls available to all tiers.

Design Principles¶

Multi-label: A bot can hold multiple classifications simultaneously. Multi-purpose crawlers are enforced against the most restrictive applicable rule.
Behavior over identity: What a bot does matters more than who runs it. The same operator with different behaviors should use separate crawlers.
Future-proof framing: Avoids defining "AI" as a static category — instead asks "what are they storing?" and "how will they reshare?"
Incentive-compatible: The most-restrictive-rule enforcement creates a natural pressure for operators to separate crawlers by purpose.

Trade-offs¶

Multi-label classification with most-restrictive enforcement is conservative — it may over-block legitimate mixed-purpose crawlers.
The taxonomy assumes observable behavioral signals exist to distinguish categories; operators who obfuscate their purpose undermine the model.
Category boundaries will shift as technology evolves (e.g., the line between Agent and Search blurs with AI answer engines).

Seen In¶

sources/2026-07-01-cloudflare-ai-traffic-options — canonical definition and rationale
systems/cloudflare-botbase — stores per-bot classifications
systems/cloudflare-bot-management — enforces category-level rules