Skip to content

SYSTEM Cited by 2 sources

Cilium

Cilium is a widely-adopted Kubernetes CNI (cilium.io) built on systems/ebpf: pod networking, L3/L4/L7 policy enforcement, service-mesh data plane, observability — all implemented via eBPF programs attached to TC, XDP, socket, and cgroup hooks.

This wiki currently references Cilium as an example of multi-tenant eBPF on the same host, not for its own internals.

2022 Datadog × Cilium incident

Both products use TC classifier (SCHED_CLS) eBPF programs on pod network interfaces. Cilium attaches with a hardcoded priority of 1 and a hardcoded handle of 0:1 (per-classifier unique identifier).

Race condition on pod startup: systems/datadog-workload-protection sometimes loaded its TC filters before Cilium, unintentionally grabbing handle 0:1. When Cilium later loaded and replaced Datadog's filters with its own, Datadog's network-namespace cleanup logic — which watches for unexpected handle changes to avoid leaking resources — interpreted the change as a cleanup signal and deleted Cilium's filters, breaking pod connectivity (Source: sources/2026-01-07-datadog-hardening-ebpf-for-runtime-security, CiliumCon 2023 post-mortem).

"Both solutions are independently correct and stable" — Datadog, on what makes the class of bug hard.

Takeaway

TC priorities, TC handles, cgroup program ordering, and similar shared kernel resources are effectively an inter-vendor protocol on any host with more than one eBPF tool. The generalised lesson is patterns/shared-kernel-resource-coordination:

  • Use higher default priorities so infrastructure-level classifiers (CNIs) get to run first.
  • Harden cleanup against races — default to never auto-deleting queuing disciplines or shared kernel resources.
  • Coordinate with peer vendors on conventions; detect and warn on unexpected co-residents.

Seen in

Last updated · 200 distilled / 1,178 read