Skip to content

SYSTEM Cited by 1 source

iostat

What it is

iostat reports per-block-device I/O statistics on Linux. Ships in the sysstat package. It is the canonical tool for disk-side performance triage — the pivot target when vmstat's wa column or a high %iowait points at disk as the bottleneck.

Canonical invocation

iostat -xz 1
  • -x — extended statistics (the useful columns — await, avgqu-sz, %util).
  • -z — omit devices with no activity, so the output is scoped to what's actually busy.
  • 1 — one-second samples.

Key output columns

Column Meaning
r/s, w/s Reads / writes per second (applied workload; utilisation primitives)
rkB/s, wkB/s Read / write throughput
await Average I/O completion time (queue + service) in ms
r_await, w_await Split-direction await
avgqu-sz Average queue depth
svctm Average service time (unreliable on modern devices; await is the trustable one)
%util Percent of time the device was doing work

Interpretation rules from the Netflix checklist

  • %util > 60% = usually hurts performance; ~100% = usually saturated — with a crucial caveat: "if the storage device is a logical disk device fronting many back-end disks, then 100% utilization may just mean that some I/O is being processed 100% of the time, however, the back-end disks may be far from saturated, and may be able to handle much more work." Applies to LVM, software RAID, and virtualised cloud block storage.
  • avgqu-sz > 1 often = saturation — with the same caveat; virtual devices may be serving many concurrent back-end requests.
  • await larger than expected = device saturation or device problems.
  • r/s, w/s, rkB/s, wkB/s = workload characterisation — what workload is actually applied? A performance problem "may simply be due to an excessive load applied."

The %util interpretation problem on modern devices

The %util column originated in an era of single-queue single-head HDDs where "busy" and "saturated" were the same thing. On modern NVMe SSDs that can service many commands concurrently, %util = 100% is the start of the useful range, not the end. Cloud block devices (EBS, Azure Disk, GCE PD) are multi-backend and even more decoupled from the %util signal. Netflix's framing is explicit — %util is a busy percent, not a saturation signal on its own.

Why async I/O changes the interpretation

Netflix's caveat on the whole disk-perf axis:

Bear in mind that poor performing disk I/O isn't necessarily an application issue. Many techniques are typically used to perform I/O asynchronously, so that the application doesn't block and suffer the latency directly (e.g., read-ahead for reads, and buffering for writes).

High await on a device does not automatically mean the application is suffering; the application may have buffered or batched the request out of its critical path.

Seen in

Last updated · 319 distilled / 1,201 read