PATTERN Cited by 1 source

Stealth-on-block fallback¶

Shape (anti-pattern)¶

A crawler operator runs a declared, robots.txt-compliant crawler by default. When that declared crawler is blocked by an origin (via WAF rule, network-layer block, or robots.txt Disallow), the operator falls back to a second, undeclared stealth crawler — spoofed user-agent, unlisted IP pool, rotating ASNs — to fetch the same content anyway.

This is an anti-pattern documented as bad behavior, not a design recommendation. Cloudflare names it explicitly in its 2025-08-04 post as the reason for delisting Perplexity from the Verified Bots program.

Why operators adopt it¶

Coverage pressure — the operator's product (an AI answer engine, a scraping aggregator) is judged on how often it can answer user queries with specific content. Giving up on blocked origins degrades answer quality.
Plausible deniability — the declared crawler appears compliant. A site owner who runs a robots.txt rule and stops seeing PerplexityBot traffic may not notice the stealth traffic unless they look for it.
Asymmetric cost — the operator pays for proxy / egress pools; the origin pays the debugging cost of figuring out what's hitting them.

Why it fails¶

ML fingerprinting closes the loop (patterns/stealth-crawler-detection-fingerprint) — request-shape features survive UA + IP + ASN rotation.
Graceful degradation on successful block is observable — in Cloudflare's test, when the stealth crawler was blocked, Perplexity's answers became measurably less specific, confirming the block was working end-to-end. The stealth path is not free capacity; it's a dependency.
Verified-bot delisting moves the operator out of the known-good allowlist, applying bot-management scoring to the declared crawler as well. The evasion by the stealth half contaminates the reputation of the declared half.
Reputation damage — the 2025-08-04 post is a public robots.txt-violation attribution; future pay-per-crawl / licensing negotiations are harder for the operator.

Escalation signature¶

What distinguishes stealth-on-block from ordinary multi-homed crawling:

Timing — stealth activity increases after the declared crawler is blocked, not at a steady rate.
Reactive IP + ASN rotation — new pools come online in response to new blocks, not as steady-state capacity.
Content specificity degradation on successful stealth block — proves the stealth path was the operator's actual retrieval path, not a side-channel.

Structural fix¶

Norms #4 and #5 from Cloudflare's five operator norms:

#4 Separate bots for separate activities — publish one bot per use case so origins can allow some and deny others.
#5 Follow the rules — honor robots.txt, rate limits, and security protections.

Per norm #1 (Be transparent): if a declared crawler is blocked, the operator's response should be to publish a contact, a new bot for a new purpose, or accept the block — not to ship a stealth bypass.

For cryptographic-identity-cooperating crawlers, the Web Bot Auth equivalent would be to sign the fallback request honestly; ChatGPT Agent's signed requests are the positive-control instance.

concepts/stealth-crawler (concept anchor).
concepts/user-agent-rotation / concepts/asn-rotation / concepts/ip-rotation-as-evasion.
concepts/robots-txt-compliance.
patterns/stealth-crawler-detection-fingerprint / patterns/verified-bot-delisting.
systems/perplexity-ai — canonical operator.

Seen in¶

sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers — canonical wiki instance.