SYSTEM Cited by 2 sources
Datadog Workload Protection — File Integrity Monitoring¶
Datadog Workload Protection File Integrity Monitoring (FIM) is the file-monitoring subsystem of systems/datadog-workload-protection, built on systems/ebpf. Its published challenge: detect unauthorized changes to sensitive files in real time across Datadog's entire infrastructure, with enough context to attribute each change to a process and container, at the scale of >10 billion file-related events per minute — without dropping events or degrading host performance.
Architecture¶
- Agent, co-resident on each host, loads eBPF programs into kernel hooks covering file-related syscalls.
- eBPF programs push events through a ring buffer to the Agent.
- Agent runs a user-space rule engine; rule-matching events are serialized (~5 KB / event w/ process + container context) and forwarded to the Datadog backend for detection + notification.
- Agent-side rules discard noise before it ever crosses the network (concepts/edge-filtering).
The key design moves¶
- Why eBPF over alternatives. Periodic filesystem scans miss
tamper-then-revert and lack change context.
inotifyhas no process/container correlation.auditdhas the context but struggles with heavy system loads. eBPF gave the team real-time observability with context, verifier-gated kernel safety. - Agent-side rule evaluation (concepts/edge-filtering). Naïve forwarding would be multi-TB/s fleet-wide; evaluating rules locally drops the stream from ~10B events/min to ~1M/min before it leaves the host.
- In-kernel filtering (concepts/in-kernel-filtering). The ring buffer itself becomes the bottleneck at ~5K syscalls/sec on sensitive workloads. Moving as much rule evaluation as eBPF verifier limits allow into kernel space drastically reduces user-space pressure.
- Two-stage evaluation (patterns/two-stage-evaluation). Cheap kernel pass using approver/discarder eBPF maps, then a second deeper pass in user space with rich correlations. The kernel stage protects the user-space stage; the user-space stage protects the backend.
- Approvers + discarders (patterns/approver-discarder-filter).
- Approvers — rule-compile-time extracted concrete values
(e.g.
/etc/passwdfrom aopen.file.path == "/etc/passwd"clause) loaded into an eBPF map; passes are forwarded. - Discarders — runtime-learned values the rule engine can
prove will never match any active rule (e.g.
/tmpunder a/etc/*-only ruleset), loaded into an LRU eBPF map so hottest noise stays resident with bounded memory.
Reported outcome¶
- ~94% of events pre-filtered directly in the kernel.
- Input: >10B events/min → Output: ~1M events/min crossing the network to the backend.
- No dropped events.
- "Dramatically lower CPU usage" vs. forwarding-everything.
Seen in¶
- sources/2025-11-18-datadog-ebpf-fim-filtering — full architectural narrative.
- sources/2026-01-07-datadog-hardening-ebpf-for-runtime-security — 5-year operational retrospective; FIM appears as the most-cited worked example (~95% of events dropped before user space under the default ruleset) and the broader Workload Protection infrastructure (CI matrix, CO-RE + fallbacks, systems/ebpf-manager, self-tests, minimum-viable hook set) is framed as what makes those FIM numbers sustainable.
Related¶
-
systems/datadog-workload-protection — parent product
-
systems/ebpf — kernel runtime.
- concepts/in-kernel-filtering — the filter-at-producer move.
- concepts/edge-filtering — agent-side rules as an edge filter.
- patterns/two-stage-evaluation — kernel (cheap) + user-space (rich) staged filter.
- patterns/approver-discarder-filter — the named static + dynamic dual used here.
- concepts/control-plane-data-plane-separation — user-space rule engine as control, eBPF programs + maps as data plane.