Skip to content

CLOUDFLARE 2025-07-01 Tier 1

Read original ↗

Cloudflare: Introducing pay per crawl — Enabling content owners to charge AI crawlers for access

Summary

Cloudflare announces Pay Per Crawl (private beta, 2025-07-01), a framework that lets publishers monetize AI-crawler access to their content at internet scale by reviving the mostly-unused HTTP 402 Payment Required response code. For each request a site owner chooses one of three outcomes — Allow (free), Charge (flat per-request domain-wide price via 402), or Block (functional 403 that still advertises the relationship could exist) — enforced by a rules engine that runs after the site's WAF and bot-management layer. Anti-spoofing uses Cloudflare's Web Bot Auth proposal: crawlers generate an Ed25519 keypair, publish the JWK-formatted public key at a hosted directory, register the directory URL + user agent with Cloudflare, and sign every request with HTTP Message Signatures (RFC 9421) carrying signature-agent, signature-input, and signature headers. Two price-negotiation flows: reactive (crawler requests, gets 402 + crawler-price, retries with crawler-exact-price) and preemptive (crawler sends crawler-max-price up front — served with crawler-charged if the configured price ≤ max). Cloudflare acts as the Merchant of Record, aggregates billing events across all participating publishers and crawlers, charges crawlers, and distributes earnings. Framed as groundwork for a future agentic paywall where AI agents programmatically negotiate content access inside a spending budget.

Key takeaways

  1. HTTP 402 as the negotiation primitive. Cloudflare picks the long-dormant 402 Payment Required status code (defined but effectively unused since HTTP/1.1) as the signal that a resource requires payment, rather than inventing a new code. Response carries a crawler-price: USD XX.XX header. Standard HTTP semantics mean existing clients, proxies, and log-collectors understand the response as a client error without the crawler having to intercept anything custom at the network layer. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  2. Anti-spoofing via Web Bot Auth, not IP allowlists. Because the crawler pays, Cloudflare must be certain which crawler is making each request. Allowlists of crawler IPs aren't good enough — anyone on the same egress IP could spoof. Web Bot Auth requires crawlers to (a) generate an Ed25519 keypair, (b) publish the JWK-formatted public key in a hosted directory, (c) register the directory URL and user-agent with Cloudflare, and (d) sign every HTTP request with RFC 9421 HTTP Message Signatures using that key. Cryptographic identity, not network identity. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  3. Three request headers carry the signature. Signed crawler requests include signature-agent (directory URL identifying the bot operator), signature-input (covered fields, keyid, algorithm ed25519, timestamp, expiry, nonce, tag web-bot-auth), and signature (the actual ed25519 signature bytes, base64url). Cloudflare resolves the signature-agent URL, fetches the JWK directory, picks the key by keyid, verifies. Tag web-bot-auth disambiguates this use of HTTP Message Signatures from other deployments. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  4. Two price-negotiation flows — reactive and preemptive. Reactive: crawler requests the resource blind → receives HTTP 402 Payment Required + crawler-price: USD XX.XX → retries with crawler-exact-price: USD XX.XX declaring willingness to pay that price. Preemptive: crawler includes crawler-max-price: USD XX.XX on the first request; if configured-price ≤ max-price, the server responds 200 OK with crawler-charged: USD XX.XX (the configured price, not the max). If configured price exceeds the max, server replies 402 with the configured crawler-price as usual. Only one of crawler-exact-price or crawler-max-price is allowed per request. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  5. Rules engine runs after WAF and bot-management. Allow / Charge / Block decisions are applied only after the site's existing WAF policies and bot-management / bot-blocking features have fired. Lets publishers keep their security posture unchanged — they layer monetization on top, not through it. Architecturally: bot auth lives in a later pipeline stage than access / threat controls. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  6. "Charge" without a billing relationship is a 403 with a future option. If a publisher selects Charge for a crawler that doesn't have a billing relationship with Cloudflare and therefore can't be charged, the effect is identical to 403 Forbidden (no content returned) — but the crawler is told pricing exists and a future relationship is possible. This is intentional: the crawler operator sees "you could access this for $X if you opted in" instead of a silent network-level block. Turns block-for-unknown-bots into a standing offer. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  7. Cloudflare is Merchant of Record. Billing events emit when a signed crawler request with payment intent receives an HTTP-200-family response with a crawler-charged header. Cloudflare aggregates the events, charges the crawler, and distributes earnings to publishers. Eliminates the bilateral-contract coordination problem — historically charging a crawler required "knowing the right individual and striking a one-off deal", insurmountable for small publishers. A single intermediary turns it into an N-to-M marketplace. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  8. Priced flat-per-request domain-wide initially, with per-crawler overrides. Publishers configure one flat price across their entire site; they can bypass the charge for specific crawlers (e.g. a pre-negotiated partnership, a free crawler they want to allow). Explicitly not supported at launch: per-path, per-content-type, or dynamic (demand-based) pricing — Cloudflare flags these as things they expect to evolve. Granular-licensing (training vs inference vs search) is future scope. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

  9. Agentic paywall is the stated end-state. Cloudflare frames HTTP-402 as the substrate for a future where AI agents hit 402s, consult a user-granted budget, decide whether to pay, and retry with crawler-exact-price — all programmatically. "Imagine asking your favorite deep research program to help you synthesize the latest cancer research ... and then giving that agent a budget to spend to acquire the best and most relevant content." The pay-per-crawl protocol is deliberately designed to generalize from crawler-to-publisher to agent-to-resource. (Source: sources/2025-07-01-cloudflare-pay-per-crawl)

Systems / concepts / patterns introduced

Operational numbers

  • Ed25519 signatures on every request (smallest/fastest of the modern public-key signature schemes — ~64-byte signatures, constant-time).
  • Signature window: createdexpires timestamps in signature-input (example in post uses a 1-hour window, 1735689600 → 1735693200).
  • Nonce: randomly generated per request, base64-url, anti-replay (example in post is 64 bytes).
  • Launch shape: flat per-request USD price domain-wide.

Caveats / gaps

  • Private beta at post time. No crawler counts, no publisher counts, no billing-event throughput, no latency overhead of the signature-verification path, no price-discovery data.
  • Crawler must publish a JWK directory — imposes a static hosting requirement on every participating bot operator. No fallback for crawlers that can't run a web directory.
  • Only one crawler-exact-price OR crawler-max-price per request — no way to send both, no way to express "I'll pay up to X for content A but up to Y for content B" in a single request.
  • Flat-price-only at launch. No per-path, per-content-type, per-rate-tier, or granular-license (training vs inference) pricing.
  • Payment settlement details undocumented. How crawler bills are settled (prepaid balance? invoice? card-on-file?), dispute resolution, refund mechanics, publisher minimum-payout thresholds all unspecified in the launch post.
  • Bot-auth identity revocation path unspecified. If a crawler's Ed25519 key leaks, the JWK directory presumably rotates, but the grace period / propagation / in-flight-signed-request handling aren't described.
  • No discussion of how non-Cloudflare-fronted sites participate. Cloudflare acts as the Merchant of Record, so participating sites appear to need a Cloudflare zone — origin-direct sites without Cloudflare in front cannot plug into the marketplace.

Source

Last updated · 200 distilled / 1,178 read