PATTERN Cited by 2 sources

Stealth crawler detection fingerprint¶

Shape¶

When a crawler operator evades identity-layer enforcement (UA spoofing, IP rotation, ASN rotation), build an ML classifier over content-independent request features that produces a stable detection signature the attacker cannot cheaply rotate around. Ship the signature as a customer-deployable block rule.

Structural steps¶

Collect labels. Use controlled-experiment traces (patterns/brand-new-domain-experiment), cooperative- crawler declarations, honeypot domains, and retrospective analysis of customer complaints to produce positive and negative examples of the stealth operator's traffic.
Feature engineering over network-level signals. TLS fingerprints (JA3 / JA4), HTTP/2 frame ordering, request timing distributions, IP reputation, ASN shape across a session, cross-domain request-graph patterns. Avoid features the attacker controls cheaply (UA, referer, cookie values).
Train a classifier. Typical choice: gradient-boosted trees on request-level features producing a bot score; the inference is fast enough to run in the edge request path.
Validate against retention of the detection under known evasion tactics — retrain when the attacker rotates features.
Ship as a managed rule ( Cloudflare-managed rule) so customers get the protection without building their own ML stack.
Propagate fingerprints via patterns/gossip-fingerprint-propagation so a newly- learned signature defends the whole POP fleet, not just the POP that observed the attack.

Complements¶

patterns/verified-bot-delisting — the policy-layer enforcement. Delisting flips the default posture from "allow known bot" to "run bot-management scoring"; the ML fingerprint is what makes scoring produce a non-zero signal.
patterns/brand-new-domain-experiment — the labeling methodology that produces high-confidence positives.

Canonical instance¶

Cloudflare's Perplexity stealth-crawler signature (2025-08-04). The post discloses:

"We were able to fingerprint this crawler using a combination of machine learning and network signals."
All stealth-UA traffic scored as bot; failed managed challenges.
Block signatures added to the managed AI-bots ruleset, available to all customers including free tier.
Survives Perplexity's IP + ASN rotation.

The post does not publish the feature list — deliberate, because publication accelerates evasion iteration.

The adversarial feedback loop¶

The pattern is point-in-time, not terminal:

"Once this post is live the behavior we saw will almost certainly change, and the methods we use to stop them will keep evolving as well."

Each fingerprint buys a window; the operator iterates; the defender iterates. The steady-state posture is continuous retraining + propagation, not a one-shot detection.

Seen in¶

sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers — canonical wiki instance.
sources/2026-04-21-vercel-botid-deep-analysis-catches-a-sophisticated-bot-network-in-real-time — sibling instance at a different layer. Vercel / Kasada's BotID Deep Analysis fingerprints browser telemetry rather than TLS / HTTP network features and triggers on cross-session proxy-node correlation rather than UA-declaration mismatch. Response is per-session forced re-verification (patterns/correlation-triggered-reverification) rather than a managed-ruleset signature. Together the two sources show the pattern generalises across vendor, feature space, and response mechanism.