Skip to content

CONCEPT Cited by 1 source

CPU utilization vs saturation

Definition

Utilisation and saturation are two separate measurements of the same CPU. They can (and frequently do) diverge, and conflating them is one of the most common triage mistakes.

  • Utilisation = the fraction of wall-clock time a CPU was busy servicing work (any non-idle state). On Linux: us + sy + ni + hi + si + st + guest + gnice across /proc/stat.
  • Saturation = the degree to which more work is demanded than the CPU can service, surfaced as queue depth or wait time. On Linux: vmstat's r column (count of tasks running on CPU + waiting to run) or run queue latency (time tasks spend in TASK_RUNNING before dispatch).

Why both matter

Four corner cases make the distinction load-bearing:

  1. High utilisation, low saturation. CPU 99% busy, r ≤ CPU count. Work arrives at the rate the CPU can service it — high throughput, stable latency. Usually fine.
  2. High utilisation, high saturation. CPU 99% busy, r ≫ CPU count. Work arrives faster than it can be serviced; queues grow; tail latency blows up. The classic CPU- bottleneck shape.
  3. Low utilisation, high saturation. Uncommon but diagnostic. CPU is idle because tasks are blocked on something else (locks, I/O, cgroup CFS throttling) — they're runnable but not running. Pair this with %iowait or cgroup throttling counters.
  4. Low utilisation, low saturation. Healthy idle or under- subscribed host.

Netflix's framing

From Brendan Gregg's 60-second checklist:

r : Number of processes running on CPU and waiting for a turn. This provides a better signal than load averages for determining CPU saturation, as it does not include I/O. To interpret: an "r" value greater than the CPU count is saturation.

And on utilisation:

The CPU time breakdowns will confirm if the CPUs are busy, by adding user + system time.

Two different measurements on the same vmstat line.

Example: the "99% CPU, queued" shape

The post's worked example:

 r  b swpd   free   buff  cache   si   so  ... us sy id wa st
34  0    0 200889792  73708 591828    0    0  ... 96  1  3  0  0
32  0    0 200889920  73708 591860    0    0  ... 98  1  1  0  0

r = 32-34 on a 32-CPU host, us ≈ 98, sy ≈ 1. CPU is near- 100% utilised and saturated. This is not a single over-busy CPU — it's a persistently-deeper-than-CPU-count run queue. mpstat confirms no single CPU is hotter than the others.

When saturation is the better signal

Load average can be ambiguous (mixes CPU and I/O). concepts/run-queue-latency via eBPF gives the cleanest scheduler-layer view. vmstat's r is a middle ground — more specific than load average, less deep than run-queue latency — and available on every stock Linux host without extra tooling.

Seen in

Last updated · 319 distilled / 1,201 read