Skip to content

PATTERN Cited by 1 source

Stealth-on-block fallback

Shape (anti-pattern)

A crawler operator runs a declared, robots.txt-compliant crawler by default. When that declared crawler is blocked by an origin (via WAF rule, network-layer block, or robots.txt Disallow), the operator falls back to a second, undeclared stealth crawler — spoofed user-agent, unlisted IP pool, rotating ASNs — to fetch the same content anyway.

This is an anti-pattern documented as bad behavior, not a design recommendation. Cloudflare names it explicitly in its 2025-08-04 post as the reason for delisting Perplexity from the Verified Bots program.

Why operators adopt it

  • Coverage pressure — the operator's product (an AI answer engine, a scraping aggregator) is judged on how often it can answer user queries with specific content. Giving up on blocked origins degrades answer quality.
  • Plausible deniability — the declared crawler appears compliant. A site owner who runs a robots.txt rule and stops seeing PerplexityBot traffic may not notice the stealth traffic unless they look for it.
  • Asymmetric cost — the operator pays for proxy / egress pools; the origin pays the debugging cost of figuring out what's hitting them.

Why it fails

  • ML fingerprinting closes the loop (patterns/stealth-crawler-detection-fingerprint) — request-shape features survive UA + IP + ASN rotation.
  • Graceful degradation on successful block is observable — in Cloudflare's test, when the stealth crawler was blocked, Perplexity's answers became measurably less specific, confirming the block was working end-to-end. The stealth path is not free capacity; it's a dependency.
  • Verified-bot delisting moves the operator out of the known-good allowlist, applying bot-management scoring to the declared crawler as well. The evasion by the stealth half contaminates the reputation of the declared half.
  • Reputation damage — the 2025-08-04 post is a public robots.txt-violation attribution; future pay-per-crawl / licensing negotiations are harder for the operator.

Escalation signature

What distinguishes stealth-on-block from ordinary multi-homed crawling:

  • Timing — stealth activity increases after the declared crawler is blocked, not at a steady rate.
  • Reactive IP + ASN rotation — new pools come online in response to new blocks, not as steady-state capacity.
  • Content specificity degradation on successful stealth block — proves the stealth path was the operator's actual retrieval path, not a side-channel.

Structural fix

Norms #4 and #5 from Cloudflare's five operator norms:

  • #4 Separate bots for separate activities — publish one bot per use case so origins can allow some and deny others.
  • #5 Follow the rules — honor robots.txt, rate limits, and security protections.

Per norm #1 (Be transparent): if a declared crawler is blocked, the operator's response should be to publish a contact, a new bot for a new purpose, or accept the block — not to ship a stealth bypass.

For cryptographic-identity-cooperating crawlers, the Web Bot Auth equivalent would be to sign the fallback request honestly; ChatGPT Agent's signed requests are the positive-control instance.

Seen in

Last updated · 200 distilled / 1,178 read