SYSTEM Cited by 2 sources
Datadog Workload Protection¶
Datadog Workload Protection is Datadog's Linux runtime-security product — an on-host agent powered by systems/ebpf that detects active threats (file-integrity violations, process execution, network behaviour) at kernel-syscall granularity, with process + container attribution, without custom kernel modules.
The product is 5+ years old as of 2026 and is deployed across Datadog's own edge infrastructure as well as customer fleets.
Design goals¶
- Real-time detection of in-progress threats (not periodic scan) — window between vulnerability disclosure and patch rollout, or detection of zero-days, without relying on shift-left being complete.
- Process + container context on every event (what
inotifylacks, whatauditdcan't scale to). - Low kernel / host overhead — the agent must not become the performance or stability issue.
- Kernel-safe — no custom kernel modules.
Architecture¶
- Per-host Agent loads eBPF programs into kernel hooks (file syscalls, process exec paths, network-layer TC programs, etc.).
- eBPF programs push selected events through ring buffers to the Agent.
- Agent runs a user-space rule engine, serialises matches (~5 KB/event with process + container + other context), forwards to the Datadog backend.
- Agent-side rules + in-kernel filters drop noise before it leaves the host (concepts/edge-filtering) or before it leaves the kernel (concepts/in-kernel-filtering).
Components / building blocks¶
- File Integrity Monitoring (FIM) — the most-documented subsystem; at ~10B file-events/min fleet-wide, ~94% filtered in-kernel via approvers + discarders in eBPF maps.
- Process-execution monitoring with interpreter-aware rules
(
process.interpreter.*,process.ancestors.interpreter.*), path resolution, hard-link enumeration. - Network TC classifiers (
SCHED_CLS) — source of the 2022 systems/cilium multi-tenancy incident. - Dedicated
bpfevent type capturing all BPF activity (program loads, map ops, attachments) — makes the security tool's own kernel usage observable, and lets other eBPF vendors on the same host be inventoried + rule-matched.
Operational infrastructure¶
- systems/ebpf-manager — Datadog's open-source Go library for eBPF-program lifecycle shared across Workload Protection, Cloud Network Monitoring, Universal Service Monitoring.
- systems/co-re + fallback offset-guessing + hardcoded offsets — kernel-structure read stability back to kernel 4.14 (older for some CentOS).
- Minimum-viable hook set declared via
ebpf-manager— if the critical programs fail to load/attach, Workload Protection refuses to start with an actionable error instead of silently serving reduced coverage. - CI matrix of kernel versions + distributions — "not supported unless actively tested in CI."
- Agent self-tests at startup, reporting to the Datadog backend — customers get visibility into their exact security coverage state.
- Dedicated rule-CI lets detection engineers test rules before release.
- Staged rollout (patterns/staged-rollout) — CI matrix → Datadog internal dogfooding → gradual customer rollout for both agent code and detection content.
Reported scale / outcomes¶
- ~10B file-related events/minute fleet-wide at steady state, ~1M events/minute after in-kernel + Agent-side filtering.
- ~94% of events pre-filtered in the kernel (FIM; default ruleset).
- No dropped events under the post-filtering load.
- Runs on Datadog's own edge — a strong internal proof point.
Research / prior work¶
- ebpfkit (2021 hackathon; BlackHat 2021 / DEF CON 29) — an eBPF rootkit PoC (process hiding, network scanning, exfil, C2, persistence) shipped by the same team to probe eBPF's own attack surface. Directly informs Workload Protection's own eBPF-tamper-detection strategy.
- "Return to sender" (BlackHat 2022) — Datadog proposals for protecting eBPF-based detection tools from kernel-exploit-based disablement; several are now in the agent.
Seen in¶
- sources/2025-11-18-datadog-ebpf-fim-filtering — FIM-specific architectural deep-dive (approvers, discarders, LRU eBPF maps, two-stage evaluation, 94% in-kernel filter rate).
- sources/2026-01-07-datadog-hardening-ebpf-for-runtime-security — 5-year operational retrospective: kernel-compat, hook coverage, data-read correctness, cache pitfalls, rule-writing, eBPF attack surface, multi-tenancy (the Cilium incident), performance cost, safe rollout.
Related¶
- systems/ebpf — substrate
- systems/datadog-workload-protection-fim — FIM subsystem
- systems/ebpf-manager — lifecycle library
- systems/co-re — portability technique
- systems/cilium — 2022 multi-tenancy incident
- concepts/ebpf-verifier — the constraint that shapes everything
- patterns/shared-kernel-resource-coordination — generalised lesson from the Cilium incident
- patterns/approver-discarder-filter — FIM's in-kernel filter dual
- patterns/two-stage-evaluation — kernel (cheap) → user-space (rich)
- patterns/staged-rollout — CI matrix → dogfood → gradual