CONCEPT Cited by 2 sources

Declared crawler¶

Definition¶

A declared crawler is an automated web client whose operator has published:

A unique user-agent string that names the operator and bot.
A list of source IP ranges (or equivalent identity credentials, e.g. Web Bot Auth signatures).
A public purpose statement — what the bot is for, how often it crawls, how to contact if it misbehaves.

Origins can then decide per bot: allow, rate-limit, challenge, or block. Classical examples: Googlebot, Bingbot, AhrefsBot. Modern AI examples: GPTBot, ClaudeBot, CCBot, PerplexityBot / Perplexity-User, ChatGPT-User.

Cloudflare's 2025-08-04 post names this as the first of five operator norms for cooperative crawlers: be transparent.

Counterparts¶

Undeclared crawler — not in the operator's published list. Could be innocent (new / internal / unattributed) or a stealth crawler.
Stealth crawler — actively misrepresents identity.

Separate bots for separate activities¶

Cloudflare's norm #4 in the same post: one activity per crawler. PerplexityBot (training/indexing) vs Perplexity-User (on-demand user fetch) is the documented two-crawler split that lets a site allow on-demand-user fetches but deny training/indexing. Lumping all crawling into a single UA forces origins into an all-or-nothing decision.

Seen in¶

sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers — canonical wiki instance; enumerates Perplexity's two declared crawlers, ChatGPT's declared crawler, and contrasts all three with Perplexity's stealth crawler.
sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process — Googlebot-as-canonical-declared-crawler instance. The Vercel + MERJ study's bot-detection path (patterns/edge-middleware-bot-beacon-injection) relies on Google's published IP-range JSON + reverse-DNS verification to identify Googlebot. The declared-crawler transparency (User-Agent + published IP ranges + *.googlebot.com rDNS) is what enables the study's clean bot-only measurement without conflating with human traffic or stealth crawlers.

Declared crawler¶

Definition¶

Counterparts¶

Separate bots for separate activities¶

Seen in¶

Related¶