CONCEPT Cited by 1 source
Declared crawler¶
Definition¶
A declared crawler is an automated web client whose operator has published:
- A unique user-agent string that names the operator and bot.
- A list of source IP ranges (or equivalent identity credentials, e.g. Web Bot Auth signatures).
- A public purpose statement — what the bot is for, how often it crawls, how to contact if it misbehaves.
Origins can then decide per bot: allow, rate-limit, challenge,
or block. Classical examples: Googlebot, Bingbot, AhrefsBot.
Modern AI examples: GPTBot, ClaudeBot, CCBot,
PerplexityBot /
Perplexity-User,
ChatGPT-User.
Cloudflare's 2025-08-04 post names this as the first of five operator norms for cooperative crawlers: be transparent.
Counterparts¶
- Undeclared crawler — not in the operator's published list. Could be innocent (new / internal / unattributed) or a stealth crawler.
- Stealth crawler — actively misrepresents identity.
Separate bots for separate activities¶
Cloudflare's norm #4 in the same post: one activity per crawler. PerplexityBot (training/indexing) vs Perplexity-User (on-demand user fetch) is the documented two-crawler split that lets a site allow on-demand-user fetches but deny training/indexing. Lumping all crawling into a single UA forces origins into an all-or-nothing decision.
Seen in¶
- sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers — canonical wiki instance; enumerates Perplexity's two declared crawlers, ChatGPT's declared crawler, and contrasts all three with Perplexity's stealth crawler.