Skip to content

CONCEPT Cited by 1 source

IO wait

Definition

%iowait on Linux is the fraction of CPU time reported as idle while at least one CPU-local task is blocked on disk I/O. It appears as the wa column in vmstat and top, and as the %iowait column in mpstat / sar.

Mechanically, %iowait is still a form of idle time — the CPU is not executing any task. What distinguishes it from plain %idle is the accounting hint: "I am idle because tasks that would otherwise be runnable are blocked waiting for disk I/O to complete."

Why it's a disk signal, not a CPU signal

Netflix's Brendan Gregg:

A constant degree of wait I/O points to a disk bottleneck; this is where the CPUs are idle, because tasks are blocked waiting for pending disk I/O. You can treat wait I/O as another form of CPU idle, one that gives a clue as to why they are idle.

The metric doesn't say the CPU is overloaded — it's not. It says the disk is slow enough to keep the CPU idle. Investigation should pivot to iostat -xz 1 for per-device breakdown:

  • %util — per-device busy percent.
  • await — average I/O completion time (queue + service) in ms.
  • avgqu-sz — average queue depth; > 1 is often saturation.
  • r/s / w/s / rkB/s / wkB/s — applied workload, for workload characterisation.

Common misinterpretations

  • "%iowait is 0, so disks are fine" — not necessarily. %iowait can read 0 when no task is blocked on I/O and the CPU is doing other work; application latency can still have been hurt by slow I/O that then completed. %iowait is a hint, not a proof.
  • "High %iowait means I need more CPU" — wrong direction. %iowait time is idle CPU time; adding CPUs doesn't help. You need to investigate disk throughput, queue depth, or asynchronous I/O patterns.
  • "%iowait > X% is always bad" — depends on the workload. A batch/ETL host intentionally I/O-bound will show sustained high %iowait; an interactive-request-serving host shouldn't.

Application techniques that mask %iowait

Many production systems use asynchronous I/O (read-ahead, buffered writes, async queues) specifically so that user-visible request latency doesn't block on disk. Netflix's framing:

Bear in mind that poor performing disk I/O isn't necessarily an application issue. Many techniques are typically used to perform I/O asynchronously, so that the application doesn't block and suffer the latency directly (e.g., read-ahead for reads, and buffering for writes).

This is why a host with slow disks can still feel fast to users, and why %iowait ≈ 0 doesn't mean the disks are unburdened.

Seen in

Last updated · 319 distilled / 1,201 read