PATTERN Cited by 1 source
Autonomous distributed mitigation¶
Autonomous distributed mitigation is the architectural posture of running threat detection + threat mitigation on every node of a distributed fleet, letting each node act independently with no human in the loop, rather than funneling traffic or decisions through a central scrubbing / analytics / review tier.
Contrast two shapes:
- Centralised scrubbing: traffic is diverted to a dedicated scrubbing facility that runs the detection and mitigation logic; ingress capacity is the bottleneck; human operators review alerts; failure of the scrubbing tier = failure of mitigation.
- Autonomous distributed mitigation: every edge node runs the full detection + mitigation loop; packets are dropped at the node that received them; no central tier to target; no human review needed for the common case.
The canonical wiki instance is Cloudflare's 7.3 Tbps DDoS writeup: "Our systems successfully blocked this record-breaking 7.3 Tbps DDoS attack fully autonomously without requiring any human intervention, without triggering any alerts, and without causing any incidents" — across 477 data centres / 293 locations.
Structural recipe¶
- Anycast as the delivery fabric — ensures traffic lands where capacity is, not where a scrubber is. An attacker can't beat your POP density by being geographically diverse.
- Every service on every node — not a specialised DDoS- scrubbing fleet; the same servers that serve legitimate traffic also run the DDoS engine. Capacity is shared; failure modes are shared.
- Kernel-data-plane drop — mitigation action executes at XDP/eBPF (or equivalent) line-rate. The cost of running detection on every packet on every node is tolerable only if mitigation is cheap.
- Heuristic + streaming detection — pattern generation runs in user space (systems/dosd) with full streaming algorithms; the kernel filter is compiled from the user- space decision (see patterns/two-stage-evaluation).
- Peer gossip for shared intelligence — top fingerprints propagate via gossip/multicast so every node benefits from every other node's samples (patterns/gossip-fingerprint-propagation). This avoids turning into N independent detectors re-solving the same problem.
- Auto-expiry — mitigation rules time out when hits decay; no operator cleanup.
- Customer surface as managed rulesets — customers tune sensitivity; they do not author the kernel programs. The system is an abstraction, not a toolkit.
Why the defender wins¶
- No central target. There is nothing to DDoS: every server is a scrubbing service, so you'd have to flood the entire fleet's capacity — which is, by construction, what the fleet is sized for.
- No human-operator bottleneck. A 45-second attack doesn't wait for an on-call rotation. At scale, alerting humans per attack would make the SOC useless within a day (477 POPs × many attacks/day = infinite pager noise). Fully autonomous response is the only scalable posture.
- Observability-free common case. "No alerts were triggered, no incidents caused" — the attack is a non-event. Humans look at retrospective aggregates (threat reports, customer dashboards) but not at per-attack paging.
- Blast radius per node. A misfire (overly broad fingerprint) affects at most one node's traffic until gossip converges or the rule expires; not the entire fleet.
Where it works¶
- DDoS mitigation at CDN scale — the Cloudflare instance. Same shape available on any anycast CDN with edge code- deployability (Fastly Compute@Edge, AWS Shield on CloudFront, etc., though public internals vary).
- Bot detection at edge — same topology, different detection heuristics.
- Rate-limiting / WAF — every edge POP enforces locally, central only stores the rule set.
- Plausible-but-less-documented: anomaly detection on CDN metrics / logs where the action is "tag the response" rather than "drop the packet".
Where it doesn't work¶
- Stateful attacks that need cross-POP correlation. A slow-rate low-and-slow attack that spreads probes across many POPs below per-POP thresholds requires a central correlator. Autonomous per-node mitigation misses these; gossip helps but doesn't close the gap.
- Application-layer attacks needing full request context. L7 bot detection often needs user-agent / behaviour history that doesn't fit in XDP programs; moves to a user-space proxy (systems/pingora) that is still per-POP but heavier-weight.
- Environments without anycast. If the ingress topology concentrates traffic (single-region cloud deployments), the pattern's anycast premise is gone; centralised scrubbing is the natural alternative.
Constraints / risks¶
- Fingerprint-compile latency is the attack-onset-to-drop budget. If compile+deploy takes seconds, the first seconds of an attack may reach origin. Cloudflare doesn't publish this number; "no incidents caused" implies it's well under the attack's 45-second duration.
- False positives at the edge are hard to audit. Packets dropped in XDP are gone; no user-space log, by design (cost reasons). Retrospective analysis relies on sampled counters / fingerprint hit rates, not on captured packets.
- Gossip convergence = fleet consistency window. Fingerprints derived at one POP lag at peer POPs. An attack moving faster than gossip convergence effectively gets re-discovered at each POP.
- Heuristic escape hatch. A sufficiently-randomised attacker could starve the permutation search — mitigation efficacy is not a guarantee, just a high-90%s heuristic floor.
Seen in¶
- sources/2025-06-20-cloudflare-how-cloudflare-blocked-a-monumental-7-3-tbps-ddos-attack — canonical instance: 7.3 Tbps / 4.8 Bpps / 37.4 TB in 45 seconds, 122,145 source IPs across 161 countries, fully autonomously mitigated in 477 data centres across 293 locations — no human intervention, no alerts, no incidents. Pairs with concepts/anycast for delivery, XDP/eBPF for data-plane, systems/dosd for control, and patterns/gossip-fingerprint-propagation for cross-POP intelligence sharing.