Skip to content

CONCEPT Cited by 2 sources

Load average

Definition

Load average on Linux is a set of three numbers exposed by the uptime command and several others (top, w, /proc/loadavg): exponentially damped moving averages of the number of tasks either runnable on CPU or in uninterruptible I/O wait (TASK_UNINTERRUPTIBLE, usually blocked on disk) over 1-minute, 5-minute, and 15-minute windows.

$ uptime
 23:51:26 up 21:31, 1 user, load average: 30.02, 26.43, 19.02

Not pure CPU

Linux's load average differs from the classical Unix definition by including I/O-blocked tasks. A host with idle CPUs but saturated disks (many tasks blocked waiting on reads) will show a high load average — the metric measures system demand, not CPU utilisation. This is a frequent surprise to operators coming from Solaris / *BSD.

Three numbers = trend

The 1 / 5 / 15-minute averages are useful as a trend, not as three separate measurements:

  • 1-min ≫ 15-min — load is rising; the incident is happening now.
  • 1-min ≪ 15-min — load is falling; the worst may already be over, "you might have logged in too late and missed the issue."
  • All three ≈ same — steady-state operation, no recent perturbation.

What load average is good for

Worth a quick look only. Netflix's framing: "This gives a high level idea of resource load (or demand), but can't be properly understood without other tools." Use it to:

  • Notice that something is demanding more resources than the host is providing — but then immediately move to vmstat / mpstat / iostat to identify which resource.
  • Compare trajectory across the three windows to tell if you're arriving at or after the peak.

What load average is not good for

  • Not a CPU utilisation metric (use mpstat -P ALL 1 for that).
  • Not a CPU saturation metric (use vmstat 1's r column or run queue latency via eBPF for that).
  • Not an SLO or an alarm threshold on its own — a load average of 30 on a 32-CPU box with idle CPUs because everything is blocked on disk is a very different failure mode from a load average of 30 from CPU-bound work.

Example from Netflix production

From the [[sources/2025-07-29-netflix-linux-performance-analysis-in-60-seconds|60- second checklist post]]:

load average: 30.02, 26.43, 19.02

Rising trajectory (1-min 30 vs 15-min 19). The follow-on vmstat on the same host shows r ≈ 32-34, us ≈ 98, sy ≈ 1, wa ≈ 0, which resolves the ambiguity: it's CPU-bound user-space work saturating the 32 CPUs, not disk wait. Without the USE- Method follow-up, the load average alone would not have been enough to localise the problem.

Seen in

  • sources/2025-07-29-netflix-linux-performance-analysis-in-60-seconds — Netflix Performance Engineering's 60-second checklist treats load average as command #1 (uptime) specifically because it's cheap and directional but explicitly hands off to vmstat / mpstat / iostat for localisation. The post establishes the "demand, not utilisation" framing as canonical wiki vocabulary.

  • sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — Shlomi Noach evaluates load average as a candidate database-throttler signal and classifies it as unsuitable for a static-threshold throttler for the same reason it doesn't SLO well: "a common rough indicator is a 1 threshold for (load average)/(num CPUs). This is again a metric that must agree with your own systems. Some database deployments famously push their servers to their limits with load averages soaring far above 1 per CPU." Canonical wiki framing of load average as the queue-length fallback to run-queue latency — useful as a directional dashboard signal, insufficient as a threshold signal. Places load average in the same metric category as concepts/threads-running-mysql: useful symptom, no stable threshold.

Last updated · 319 distilled / 1,201 read