CONCEPT Cited by 1 source

cGroup ID¶

The cgroup ID is the 64-bit kernel-internal identifier for a cGroup node in the cgroup v2 hierarchy, stable for the lifetime of the cgroup. It is what kernel-space code (tracepoints, eBPF programs, LSM hooks) uses to attribute a per-task event to the cgroup the task is currently in — and therefore, for typical container platforms, to the container that owns the task.

Where it lives in the kernel¶

For a struct task_struct *task:

task->cgroups->dfl_cgrp->kn->id

task->cgroups — the task's struct css_set, pointing at the set of cgroups the task is in across all controllers.
dfl_cgrp — the default hierarchy cgroup (cgroup v2).
kn — the kernfs_node representing the cgroup in the cgroup pseudo-filesystem.
id — the 64-bit inode-number-like identifier kernfs_node.id.

Why reading it from BPF needs RCU¶

task->cgroups is an RCU-protected pointer: it can be swapped when a task is migrated between cgroups (e.g. cgroup.procs write, clone into a new cgroup, systemd scope changes). Dereferencing it without being in a RCU read-side critical section races against the writer, which may free the css_set after publishing its replacement.

From an eBPF program, the BPF subsystem exposes two kfuncs (kernel functions callable from BPF, vetted by the verifier) to bracket the deref:

void bpf_rcu_read_lock(void) __ksym;
void bpf_rcu_read_unlock(void) __ksym;

u64 get_task_cgroup_id(struct task_struct *task)
{
    struct css_set *cgroups;
    u64 cgroup_id;
    bpf_rcu_read_lock();
    cgroups = task->cgroups;
    cgroup_id = cgroups->dfl_cgrp->kn->id;
    bpf_rcu_read_unlock();
    return cgroup_id;
}

"The cgroup information in the process struct is safeguarded by an RCU (Read Copy Update) lock. To safely access this RCU-protected information, we can leverage kfuncs in eBPF. kfuncs are kernel functions that can be called from eBPF programs. There are kfuncs available to lock and unlock RCU read-side critical sections. These functions ensure that our eBPF program remains safe and efficient while retrieving the cgroup ID from the task struct." (Source: sources/2024-09-11-netflix-noisy-neighbor-detection-with-ebpf)

The concepts/ebpf-verifier statically enforces that lock and unlock are matched on every exit path; a program that returns with the RCU lock still held will fail to load.

Why it's the right key for attribution¶

The cgroup ID is stable and globally unique on the host. A userspace agent maintains a small cgroup_id → container_id map (populated from the container runtime / orchestrator) and joins on it when consuming events. Unknown cgroup IDs — i.e. cgroups not corresponding to a managed container — are attributed to system services (systemd.slice, host daemons, kernel threads) instead of being dropped, because those are real noisy-neighbor sources too.

The alternative key, PID, is wrong for multi-tenant attribution: PIDs recycle, PIDs are per-PID-namespace, and threads share a PID group but can cross cgroups in theory. Cgroup IDs don't have any of those problems.

Used as¶

Aggregation dimension for per-container Atlas histograms (runq.latency) and counters (sched.switch.out) in Netflix's run-queue monitor.
Rate-limit key for in-kernel event sampling — see patterns/per-cgroup-rate-limiting-in-ebpf.
Preempt-cause tag. Reading get_task_cgroup_id(prev) on sched_switch gives the cgroup of the task losing the CPU, so events can be tagged "preempted by same cgroup / different container / system service".

Seen in¶

sources/2024-09-11-netflix-noisy-neighbor-detection-with-ebpf — Netflix's eBPF run-queue monitor computes cgroup ID for both the incoming and outgoing task on every sched_switch via BPF RCU kfuncs, then tags per-container run-queue-latency metrics with the preempting-cgroup category.