Skip to content

PATTERN Cited by 1 source

Stealth crawler detection fingerprint

Shape

When a crawler operator evades identity-layer enforcement (UA spoofing, IP rotation, ASN rotation), build an ML classifier over content-independent request features that produces a stable detection signature the attacker cannot cheaply rotate around. Ship the signature as a customer-deployable block rule.

Structural steps

  1. Collect labels. Use controlled-experiment traces (patterns/brand-new-domain-experiment), cooperative- crawler declarations, honeypot domains, and retrospective analysis of customer complaints to produce positive and negative examples of the stealth operator's traffic.
  2. Feature engineering over network-level signals. TLS fingerprints (JA3 / JA4), HTTP/2 frame ordering, request timing distributions, IP reputation, ASN shape across a session, cross-domain request-graph patterns. Avoid features the attacker controls cheaply (UA, referer, cookie values).
  3. Train a classifier. Typical choice: gradient-boosted trees on request-level features producing a bot score; the inference is fast enough to run in the edge request path.
  4. Validate against retention of the detection under known evasion tactics — retrain when the attacker rotates features.
  5. Ship as a managed rule ( Cloudflare-managed rule) so customers get the protection without building their own ML stack.
  6. Propagate fingerprints via patterns/gossip-fingerprint-propagation so a newly- learned signature defends the whole POP fleet, not just the POP that observed the attack.

Complements

Canonical instance

Cloudflare's Perplexity stealth-crawler signature (2025-08-04). The post discloses:

  • "We were able to fingerprint this crawler using a combination of machine learning and network signals."
  • All stealth-UA traffic scored as bot; failed managed challenges.
  • Block signatures added to the managed AI-bots ruleset, available to all customers including free tier.
  • Survives Perplexity's IP + ASN rotation.

The post does not publish the feature list — deliberate, because publication accelerates evasion iteration.

The adversarial feedback loop

The pattern is point-in-time, not terminal:

"Once this post is live the behavior we saw will almost certainly change, and the methods we use to stop them will keep evolving as well."

Each fingerprint buys a window; the operator iterates; the defender iterates. The steady-state posture is continuous retraining + propagation, not a one-shot detection.

Seen in

Last updated · 200 distilled / 1,178 read