Skip to content

SYSTEM Cited by 2 sources

Datadog Workload Protection

Datadog Workload Protection is Datadog's Linux runtime-security product — an on-host agent powered by systems/ebpf that detects active threats (file-integrity violations, process execution, network behaviour) at kernel-syscall granularity, with process + container attribution, without custom kernel modules.

The product is 5+ years old as of 2026 and is deployed across Datadog's own edge infrastructure as well as customer fleets.

Design goals

  • Real-time detection of in-progress threats (not periodic scan) — window between vulnerability disclosure and patch rollout, or detection of zero-days, without relying on shift-left being complete.
  • Process + container context on every event (what inotify lacks, what auditd can't scale to).
  • Low kernel / host overhead — the agent must not become the performance or stability issue.
  • Kernel-safe — no custom kernel modules.

Architecture

  • Per-host Agent loads eBPF programs into kernel hooks (file syscalls, process exec paths, network-layer TC programs, etc.).
  • eBPF programs push selected events through ring buffers to the Agent.
  • Agent runs a user-space rule engine, serialises matches (~5 KB/event with process + container + other context), forwards to the Datadog backend.
  • Agent-side rules + in-kernel filters drop noise before it leaves the host (concepts/edge-filtering) or before it leaves the kernel (concepts/in-kernel-filtering).

Components / building blocks

  • File Integrity Monitoring (FIM) — the most-documented subsystem; at ~10B file-events/min fleet-wide, ~94% filtered in-kernel via approvers + discarders in eBPF maps.
  • Process-execution monitoring with interpreter-aware rules (process.interpreter.*, process.ancestors.interpreter.*), path resolution, hard-link enumeration.
  • Network TC classifiers (SCHED_CLS) — source of the 2022 systems/cilium multi-tenancy incident.
  • Dedicated bpf event type capturing all BPF activity (program loads, map ops, attachments) — makes the security tool's own kernel usage observable, and lets other eBPF vendors on the same host be inventoried + rule-matched.

Operational infrastructure

  • systems/ebpf-manager — Datadog's open-source Go library for eBPF-program lifecycle shared across Workload Protection, Cloud Network Monitoring, Universal Service Monitoring.
  • systems/co-re + fallback offset-guessing + hardcoded offsets — kernel-structure read stability back to kernel 4.14 (older for some CentOS).
  • Minimum-viable hook set declared via ebpf-manager — if the critical programs fail to load/attach, Workload Protection refuses to start with an actionable error instead of silently serving reduced coverage.
  • CI matrix of kernel versions + distributions — "not supported unless actively tested in CI."
  • Agent self-tests at startup, reporting to the Datadog backend — customers get visibility into their exact security coverage state.
  • Dedicated rule-CI lets detection engineers test rules before release.
  • Staged rollout (patterns/staged-rollout) — CI matrix → Datadog internal dogfooding → gradual customer rollout for both agent code and detection content.

Reported scale / outcomes

  • ~10B file-related events/minute fleet-wide at steady state, ~1M events/minute after in-kernel + Agent-side filtering.
  • ~94% of events pre-filtered in the kernel (FIM; default ruleset).
  • No dropped events under the post-filtering load.
  • Runs on Datadog's own edge — a strong internal proof point.

Research / prior work

  • ebpfkit (2021 hackathon; BlackHat 2021 / DEF CON 29) — an eBPF rootkit PoC (process hiding, network scanning, exfil, C2, persistence) shipped by the same team to probe eBPF's own attack surface. Directly informs Workload Protection's own eBPF-tamper-detection strategy.
  • "Return to sender" (BlackHat 2022) — Datadog proposals for protecting eBPF-based detection tools from kernel-exploit-based disablement; several are now in the agent.

Seen in

Last updated · 200 distilled / 1,178 read