CONCEPT Cited by 3 sources
Verified Bots¶
Verified bots is the general problem of distinguishing legitimate automated clients (search crawlers, AI training crawlers, uptime monitors, archive spiders, RSS fetchers) from abusive or imposter clients that claim to be them. The answer determines whether an origin serves content, blocks, rate-limits, or — with Cloudflare's 2025-07-01 pay-per-crawl — bills.
Why "verified" is hard¶
- User-agent strings are trivially forgeable.
Googlebot/2.1in aUser-Agentheader means nothing cryptographically. - IP allowlists drift. Cloud-egress IPs change without notice; NAT pools are shared; CDNs front multiple bots. IP-based identification hasn't been reliable in many years.
- Reverse-DNS patterns (the classic Googlebot scheme — reverse-DNS the client IP, forward-DNS the result, compare) still rely on DNS trust chains and miss bots running on non-dedicated infrastructure.
- API keys (bot shows
X-API-Key: <secret>) are bearer tokens — leak once, impersonate forever; rotation is painful. - At pay-per-crawl scale, wrong-bot-charged is a financial error, not just a policy one.
The cryptographic answer¶
Modern verified-bots schemes replace "what IP / DNS / UA is this" with "prove you hold a private key whose public counterpart is published in a directory I can fetch." Concrete instance: Cloudflare's Web Bot Auth —
- Bot operator generates an Ed25519 keypair.
- Publishes the public key at a JWK directory.
- Registers the directory URL + user-agent with Cloudflare.
- Signs every request with RFC 9421 HTTP Message Signatures.
No shared secret ever travels. Every request is independently verifiable against a public directory. Keys can rotate by publishing new JWKs at the same directory. Bot identity is effectively a decentralized PKI, with the directory URL as the trust anchor.
Generalizes beyond Cloudflare / Web Bot Auth¶
The pattern — per-request cryptographic proof of identity over a published public key — generalizes. The same primitive underlies:
- OPKSSH (Cloudflare, 2025-03-25) — OIDC ID Token + public-key commitment = PK Token, attached to an SSH certificate extension to prove a user's identity-plus-keypair.
- SSH-CA-backed access (BLESS, Smallstep) — short-lived certificate signed by a CA proves identity + holds key.
- mTLS client certificates — X.509-based equivalent.
- Web Bot Auth — RFC 9421-based equivalent, optimized for stateless HTTP request verification.
Same load-bearing invariant in all: patterns/identity-to-key-binding.
Seen in¶
- sources/2025-07-01-cloudflare-pay-per-crawl — Web Bot Auth is the canonical wiki instance.
- sources/2026-04-21-cloudflare-moving-past-bots-vs-humans — positions verified-bot schemes like Web Bot Auth as the identity branch for the small population of high-volume crawlers that tolerate attribution; contrasts with the anonymous branch (Privacy Pass / ARC / ACT) for distributed low-volume clients that need anonymity while still proving behavior. The post argues both branches are necessary — a single primitive does not serve both populations.
- sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers — canonical wiki instance of the program's enforcement lever: Cloudflare de-listed Perplexity from Verified Bots after confirming via a brand-new-domain experiment that Perplexity ran a stealth crawler to evade origin-side blocks of its declared crawlers. Delisting flips the default posture from "known-good, allow" to "run bot-management scoring." Paired with Web Bot Auth-signed ChatGPT as the positive control, the post operationalizes what "verified" means as a two-sided commitment: publish identity and honor origin directives. See patterns/verified-bot-delisting.
Related¶
- systems/web-bot-auth — canonical deployment.
- concepts/http-message-signatures — the RFC 9421 substrate.
- patterns/signed-bot-request — Ed25519 / JWK / signed-request pattern.
- concepts/pk-token — parallel primitive for user (not bot) verification.
- patterns/identity-to-key-binding — generic verifier invariant.