CONCEPT Cited by 2 sources
Load average¶
Definition¶
Load average on Linux is a set of three numbers exposed by the
uptime command and several others (top, w, /proc/loadavg):
exponentially damped moving averages of the number of tasks either
runnable on CPU or in uninterruptible I/O wait
(TASK_UNINTERRUPTIBLE, usually blocked on disk) over 1-minute,
5-minute, and 15-minute windows.
Not pure CPU¶
Linux's load average differs from the classical Unix definition by including I/O-blocked tasks. A host with idle CPUs but saturated disks (many tasks blocked waiting on reads) will show a high load average — the metric measures system demand, not CPU utilisation. This is a frequent surprise to operators coming from Solaris / *BSD.
Three numbers = trend¶
The 1 / 5 / 15-minute averages are useful as a trend, not as three separate measurements:
- 1-min ≫ 15-min — load is rising; the incident is happening now.
- 1-min ≪ 15-min — load is falling; the worst may already be over, "you might have logged in too late and missed the issue."
- All three ≈ same — steady-state operation, no recent perturbation.
What load average is good for¶
Worth a quick look only. Netflix's framing: "This gives a high level idea of resource load (or demand), but can't be properly understood without other tools." Use it to:
- Notice that something is demanding more resources than the
host is providing — but then immediately move to
vmstat/mpstat/iostatto identify which resource. - Compare trajectory across the three windows to tell if you're arriving at or after the peak.
What load average is not good for¶
- Not a CPU utilisation metric (use
mpstat -P ALL 1for that). - Not a CPU saturation metric (use
vmstat 1'srcolumn or run queue latency viaeBPFfor that). - Not an SLO or an alarm threshold on its own — a load average of 30 on a 32-CPU box with idle CPUs because everything is blocked on disk is a very different failure mode from a load average of 30 from CPU-bound work.
Example from Netflix production¶
From the [[sources/2025-07-29-netflix-linux-performance-analysis-in-60-seconds|60- second checklist post]]:
Rising trajectory (1-min 30 vs 15-min 19). The follow-on vmstat
on the same host shows r ≈ 32-34, us ≈ 98, sy ≈ 1, wa ≈
0, which resolves the ambiguity: it's CPU-bound user-space
work saturating the 32 CPUs, not disk wait. Without the USE-
Method follow-up, the load average alone would not have been
enough to localise the problem.
Seen in¶
-
sources/2025-07-29-netflix-linux-performance-analysis-in-60-seconds — Netflix Performance Engineering's 60-second checklist treats load average as command #1 (
uptime) specifically because it's cheap and directional but explicitly hands off tovmstat/mpstat/iostatfor localisation. The post establishes the "demand, not utilisation" framing as canonical wiki vocabulary. -
sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — Shlomi Noach evaluates load average as a candidate database-throttler signal and classifies it as unsuitable for a static-threshold throttler for the same reason it doesn't SLO well: "a common rough indicator is a
1threshold for(load average)/(num CPUs). This is again a metric that must agree with your own systems. Some database deployments famously push their servers to their limits with load averages soaring far above1per CPU." Canonical wiki framing of load average as the queue-length fallback to run-queue latency — useful as a directional dashboard signal, insufficient as a threshold signal. Places load average in the same metric category as concepts/threads-running-mysql: useful symptom, no stable threshold.
Related¶
- concepts/use-method · concepts/cpu-utilization-vs-saturation
- concepts/io-wait — why load average rises on disk-bound hosts with idle CPUs.
- concepts/run-queue-latency — the cleaner scheduler-layer saturation signal for CPU contention.
- systems/vmstat — the
rcolumn is the follow-up test that resolves load-average ambiguity. - patterns/sixty-second-performance-checklist