PATTERN Cited by 1 source
Stealth-on-block fallback¶
Shape (anti-pattern)¶
A crawler operator runs a declared, robots.txt-compliant
crawler by default. When that declared crawler is blocked by
an origin (via WAF rule, network-layer block, or
robots.txt Disallow), the operator falls back to a second,
undeclared stealth crawler —
spoofed user-agent, unlisted IP pool, rotating ASNs — to fetch
the same content anyway.
This is an anti-pattern documented as bad behavior, not a design recommendation. Cloudflare names it explicitly in its 2025-08-04 post as the reason for delisting Perplexity from the Verified Bots program.
Why operators adopt it¶
- Coverage pressure — the operator's product (an AI answer engine, a scraping aggregator) is judged on how often it can answer user queries with specific content. Giving up on blocked origins degrades answer quality.
- Plausible deniability — the declared crawler appears
compliant. A site owner who runs a
robots.txtrule and stops seeingPerplexityBottraffic may not notice the stealth traffic unless they look for it. - Asymmetric cost — the operator pays for proxy / egress pools; the origin pays the debugging cost of figuring out what's hitting them.
Why it fails¶
- ML fingerprinting closes the loop (patterns/stealth-crawler-detection-fingerprint) — request-shape features survive UA + IP + ASN rotation.
- Graceful degradation on successful block is observable — in Cloudflare's test, when the stealth crawler was blocked, Perplexity's answers became measurably less specific, confirming the block was working end-to-end. The stealth path is not free capacity; it's a dependency.
- Verified-bot delisting moves the operator out of the known-good allowlist, applying bot-management scoring to the declared crawler as well. The evasion by the stealth half contaminates the reputation of the declared half.
- Reputation damage — the 2025-08-04 post is a public
robots.txt-violation attribution; future pay-per-crawl / licensing negotiations are harder for the operator.
Escalation signature¶
What distinguishes stealth-on-block from ordinary multi-homed crawling:
- Timing — stealth activity increases after the declared crawler is blocked, not at a steady rate.
- Reactive IP + ASN rotation — new pools come online in response to new blocks, not as steady-state capacity.
- Content specificity degradation on successful stealth block — proves the stealth path was the operator's actual retrieval path, not a side-channel.
Structural fix¶
Norms #4 and #5 from Cloudflare's five operator norms:
- #4 Separate bots for separate activities — publish one bot per use case so origins can allow some and deny others.
- #5 Follow the rules — honor
robots.txt, rate limits, and security protections.
Per norm #1 (Be transparent): if a declared crawler is blocked, the operator's response should be to publish a contact, a new bot for a new purpose, or accept the block — not to ship a stealth bypass.
For cryptographic-identity-cooperating crawlers, the Web Bot Auth equivalent would be to sign the fallback request honestly; ChatGPT Agent's signed requests are the positive-control instance.
Related¶
- concepts/stealth-crawler (concept anchor).
- concepts/user-agent-rotation / concepts/asn-rotation / concepts/ip-rotation-as-evasion.
- concepts/robots-txt-compliance.
- patterns/stealth-crawler-detection-fingerprint / patterns/verified-bot-delisting.
- systems/perplexity-ai — canonical operator.
Seen in¶
- sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers — canonical wiki instance.