PATTERN Cited by 1 source
Fleet-wide retroactive threat hunt¶
Fleet-wide retroactive threat hunt is the pattern of, on any critical vulnerability disclosure, searching historical fleet-wide logs backwards in time for signs that the vulnerability was exploited before it was publicly known. The pattern runs in parallel with live-traffic detection and runtime mitigation; it's the forensic-evidence pillar of the assume-compromise posture. Canonical wiki articulation from the 2026-05-07 Copy Fail response post where Cloudflare searched 48 hours of fleet-wide kernel logs for the Copy Fail exploit's distinctive signature on 2026-04-30 03:14 UTC, hours after public disclosure.
Structure¶
The pattern has five investigative pillars, searched in parallel within the retroactive window:
- Kernel / application logs. The exploit's distinctive trace in kernel logs is searched across the fleet for a chosen window (Copy Fail case: 48 hours). Centralised logging is the substrate — without fleet-wide log aggregation, this pillar is infeasible at scale.
- Access logs. Who connected to potentially-affected systems, when, what commands they ran. Reconstructs the interactive activity timeline for the window, independent of automated exploit signatures.
- Binary integrity. Cryptographic hashes of system binaries compared against known-good package manifests. For page-cache-poisoning exploits (Copy Fail) this check is complementary: on-disk hashes remain clean because the tainting is in-memory only, so binary integrity is a hygienic check rather than the primary signal.
- Persistence audit. Look for common post-exploitation persistence (cron jobs, systemd units, shell profile modifications, loaded kernel modules, unusual setuid binaries, scheduled tasks).
- Network connection audit. Unusual egress, lateral-movement indicators, unexpected inbound connections, data-exfiltration signatures.
All five pillars must come back clean before asserting "no compromise". Any single pillar raising a signal triggers full incident-response escalation.
When it fits¶
- A critical vulnerability has just been publicly disclosed (or is about to be).
- The vulnerability could plausibly have been exploited by well-resourced adversaries before public disclosure.
- The fleet has centralised fleet-wide logging with retention covering at least the retroactive window.
- The exploit has a distinctive signature in at least one of the five pillars (most do: kernel logs, access patterns, process-chain anomalies).
- The team has dedicated security engineers to run the investigation within hours.
When it doesn't fit¶
- No centralised logging. Without fleet-wide log aggregation, retroactive search is infeasible at scale.
- Short log retention. If logs have rotated out before the search starts, the retroactive window is zero.
- Silent exploits with no log trace. Exploits that leave no kernel or application-log signature are invisible to pillars 1 and 2; the investigation must rely on pillars 3–5.
- Time-bounded incident. If the incident is actively ongoing and mitigation cycles are more urgent than historical forensics, the retrospective pillar can wait hours until the containment workstream stabilises.
Structural properties¶
- Runs in parallel with other incident-response workstreams. Blast-radius mapping, detection coverage validation, runtime mitigation engineering, patched-kernel rollout — and the retroactive threat hunt. Each is a distinct team / workstream.
- Window length is a parameter. Cloudflare chose 48 hours for Copy Fail; longer windows cover more potential pre-disclosure exploitation but require proportional log retention and search cost.
- Fleet-wide aggregation is load-bearing. Per-host log search doesn't scale to 100,000+ servers. The pillars rely on centralised logging infrastructure (e.g. splunk / elastic / ClickHouse / big-data platforms).
- Complementary to live detection, not substitutive. Live behavioural detection runs continuously and catches new exploitation as it happens. The retroactive hunt catches pre-detection or pre-disclosure exploitation that slipped through. Both are needed.
Canonical instance: Copy Fail 48-hour hunt (2026-04-30 03:14 UTC)¶
- Trigger: Copy Fail disclosed 2026-04-29 16:00 UTC.
- Declaration: Security incident declared 2026-04-30 03:14 UTC. Threat-hunt workstream begins.
- Window: 48 hours retroactive. Covers period before public disclosure — the threat-model concern being that well-resourced adversaries may have known about the vulnerability pre-disclosure.
- Pillar 1 — Kernel logs: The exploit leaves a distinctive trace in kernel logs when it runs. Search fleet-wide centralised logs across the 48-hour window.
- Pillar 2 — Access logs: Pulled for affected systems; reconstructed who connected, when, what commands they ran.
- Pillar 3 — Binary integrity: System binaries
(e.g.
/usr/bin/su, the canonical Copy Fail target) validated against known-good package manifests. - Pillar 4 — Persistence audit: Looked for common post-exploitation persistence mechanisms.
- Pillar 5 — Network audit: Audited network connections for anything unusual.
- Result: "Everything was clean." Canonical wiki instance of a clean result on all five pillars.
Failure modes¶
- Window too short. An adversary who exploited the vulnerability more than 48 hours before disclosure escapes the search. Extending the window multiplies search cost and requires log retention to match.
- Log retention gap. If logs rotate out before the search starts, pillars 1 and 2 have nothing to search.
- Signature too broad. Searching for too-generic indicators ("any AF_ALG socket open") surfaces huge false-positive counts and drowns signal in noise. Prefer tight signatures from the specific exploit disclosure.
- Signature too narrow. A signature tied to one exploit variant may miss adapted variants. The behavioural-detection pillar provides the complementary variant-agnostic coverage.
- Centralised logging bottleneck. At fleet scale, retroactive search itself can stress the logging infrastructure. Schedule with awareness of the logging system's capacity.
Sibling patterns¶
- Assume- compromise posture — the standing attitude the threat hunt executes on. The pattern is the posture's operational realisation.
- Behavioral detection — live-traffic counterpart. Runs continuously; catches new exploitation. The retroactive hunt catches what happened before live detection fired.
- Log-based forensic analysis — general incident-response primitive. The retroactive threat hunt is a specific structured application tied to assume-compromise posture + critical vulnerability disclosure.
Seen in¶
- 2026-05-07 — Cloudflare Copy Fail response. Canonical wiki first-class page. Security incident declared 2026-04-30 03:14 UTC, hours after Copy Fail disclosure. 48-hour window across fleet-wide kernel logs searched for the exploit's distinctive trace; access logs reconstructed; binary integrity validated; persistence mechanisms audited; network connections audited. Result: "Everything was clean." Load-bearing framing: "Our security team operates on a simple principle for critical vulnerabilities: assume compromise until you can prove otherwise." (Source: sources/2026-05-07-cloudflare-copy-fail-linux-vulnerability-response)