SYSTEM Cited by 2 sources

Datadog Workload Protection — File Integrity Monitoring¶

Datadog Workload Protection File Integrity Monitoring (FIM) is the file-monitoring subsystem of systems/datadog-workload-protection, built on systems/ebpf. Its published challenge: detect unauthorized changes to sensitive files in real time across Datadog's entire infrastructure, with enough context to attribute each change to a process and container, at the scale of >10 billion file-related events per minute — without dropping events or degrading host performance.

Architecture¶

Agent, co-resident on each host, loads eBPF programs into kernel hooks covering file-related syscalls.
eBPF programs push events through a ring buffer to the Agent.
Agent runs a user-space rule engine; rule-matching events are serialized (~5 KB / event w/ process + container context) and forwarded to the Datadog backend for detection + notification.
Agent-side rules discard noise before it ever crosses the network (concepts/edge-filtering).

The key design moves¶

Why eBPF over alternatives. Periodic filesystem scans miss tamper-then-revert and lack change context. inotify has no process/container correlation. auditd has the context but struggles with heavy system loads. eBPF gave the team real-time observability with context, verifier-gated kernel safety.
Agent-side rule evaluation (concepts/edge-filtering). Naïve forwarding would be multi-TB/s fleet-wide; evaluating rules locally drops the stream from ~10B events/min to ~1M/min before it leaves the host.
In-kernel filtering (concepts/in-kernel-filtering). The ring buffer itself becomes the bottleneck at ~5K syscalls/sec on sensitive workloads. Moving as much rule evaluation as eBPF verifier limits allow into kernel space drastically reduces user-space pressure.
Two-stage evaluation (patterns/two-stage-evaluation). Cheap kernel pass using approver/discarder eBPF maps, then a second deeper pass in user space with rich correlations. The kernel stage protects the user-space stage; the user-space stage protects the backend.
Approvers + discarders (patterns/approver-discarder-filter).
Approvers — rule-compile-time extracted concrete values (e.g. /etc/passwd from a open.file.path == "/etc/passwd" clause) loaded into an eBPF map; passes are forwarded.
Discarders — runtime-learned values the rule engine can prove will never match any active rule (e.g. /tmp under a /etc/*-only ruleset), loaded into an LRU eBPF map so hottest noise stays resident with bounded memory.

Reported outcome¶

~94% of events pre-filtered directly in the kernel.
Input: >10B events/min → Output: ~1M events/min crossing the network to the backend.
No dropped events.
"Dramatically lower CPU usage" vs. forwarding-everything.

Seen in¶

sources/2025-11-18-datadog-ebpf-fim-filtering — full architectural narrative.
sources/2026-01-07-datadog-hardening-ebpf-for-runtime-security — 5-year operational retrospective; FIM appears as the most-cited worked example (~95% of events dropped before user space under the default ruleset) and the broader Workload Protection infrastructure (CI matrix, CO-RE + fallbacks, systems/ebpf-manager, self-tests, minimum-viable hook set) is framed as what makes those FIM numbers sustainable.

systems/datadog-workload-protection — parent product
systems/ebpf — kernel runtime.
concepts/in-kernel-filtering — the filter-at-producer move.
concepts/edge-filtering — agent-side rules as an edge filter.
patterns/two-stage-evaluation — kernel (cheap) + user-space (rich) staged filter.
patterns/approver-discarder-filter — the named static + dynamic dual used here.
concepts/control-plane-data-plane-separation — user-space rule engine as control, eBPF programs + maps as data plane.

Datadog Workload Protection — File Integrity Monitoring¶

Architecture¶

The key design moves¶

Reported outcome¶

Seen in¶

Related¶