Skip to content

CONCEPT Cited by 1 source

Bot traffic taxonomy

A bot traffic taxonomy is a classification framework that categorizes automated web traffic by observable behavior — what a bot does on a site — rather than by identity or declared purpose alone. The approach acknowledges that "AI" is no longer a useful binary label as AI capabilities permeate search, agents, and training simultaneously.

Definition

A behavior-based taxonomy assigns each bot one or more behavioral classifications from a fixed vocabulary, enabling site owners to make allow/block decisions at the category level rather than per-bot. The key insight is that a single entity (e.g., Googlebot) may simultaneously serve multiple purposes (Search + Training), requiring multi-label classification.

Cloudflare's 10-Category Taxonomy (2026)

Classification Behavior
Search Proactively builds index of site content to respond to future queries
Agent Acts in real-time on behalf of a human to complete a task
Training Crawls content to train or fine-tune models permanently
Transact Checkout/payment actions on behalf of users
Data Collection Price scraping, competitive intelligence, analytics
Security Testing Vulnerability scanning, penetration testing
SEO Site auditing, accessibility checks
Ads Verification Ad placement verification, fraud detection
Social/Link Preview Link previews for social platforms and messaging
Feed Fetching RSS readers, podcast aggregators, news feeds
Monitoring & Operations Uptime monitoring, webhooks, health checks

The first three (bold) are the customer-configurable AI traffic controls available to all tiers.

Design Principles

  • Multi-label: A bot can hold multiple classifications simultaneously. Multi-purpose crawlers are enforced against the most restrictive applicable rule.
  • Behavior over identity: What a bot does matters more than who runs it. The same operator with different behaviors should use separate crawlers.
  • Future-proof framing: Avoids defining "AI" as a static category — instead asks "what are they storing?" and "how will they reshare?"
  • Incentive-compatible: The most-restrictive-rule enforcement creates a natural pressure for operators to separate crawlers by purpose.

Trade-offs

  • Multi-label classification with most-restrictive enforcement is conservative — it may over-block legitimate mixed-purpose crawlers.
  • The taxonomy assumes observable behavioral signals exist to distinguish categories; operators who obfuscate their purpose undermine the model.
  • Category boundaries will shift as technology evolves (e.g., the line between Agent and Search blurs with AI answer engines).

Seen In

Last updated · 564 distilled / 1,671 read