CONCEPT Cited by 2 sources
Declared crawler¶
Definition¶
A declared crawler is an automated web client whose operator has published:
- A unique user-agent string that names the operator and bot.
- A list of source IP ranges (or equivalent identity credentials, e.g. Web Bot Auth signatures).
- A public purpose statement — what the bot is for, how often it crawls, how to contact if it misbehaves.
Origins can then decide per bot: allow, rate-limit, challenge,
or block. Classical examples: Googlebot, Bingbot, AhrefsBot.
Modern AI examples: GPTBot, ClaudeBot, CCBot,
PerplexityBot /
Perplexity-User,
ChatGPT-User.
Cloudflare's 2025-08-04 post names this as the first of five operator norms for cooperative crawlers: be transparent.
Counterparts¶
- Undeclared crawler — not in the operator's published list. Could be innocent (new / internal / unattributed) or a stealth crawler.
- Stealth crawler — actively misrepresents identity.
Separate bots for separate activities¶
Cloudflare's norm #4 in the same post: one activity per crawler. PerplexityBot (training/indexing) vs Perplexity-User (on-demand user fetch) is the documented two-crawler split that lets a site allow on-demand-user fetches but deny training/indexing. Lumping all crawling into a single UA forces origins into an all-or-nothing decision.
Seen in¶
- sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers — canonical wiki instance; enumerates Perplexity's two declared crawlers, ChatGPT's declared crawler, and contrasts all three with Perplexity's stealth crawler.
- sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process
— Googlebot-as-canonical-declared-crawler instance. The
Vercel + MERJ study's bot-detection path
(patterns/edge-middleware-bot-beacon-injection) relies on
Google's published IP-range JSON + reverse-DNS verification
to identify Googlebot. The declared-crawler transparency
(
User-Agent+ published IP ranges +*.googlebot.comrDNS) is what enables the study's clean bot-only measurement without conflating with human traffic or stealth crawlers.