Skip to content

CONCEPT Cited by 1 source

USE Method

Definition

USE = Utilisation / Saturation / Errors — a methodology for locating performance bottlenecks proposed by Brendan Gregg at Netflix. For every resource class in a system (CPUs, memory buses, network interfaces, disks, storage controllers, network switches, etc.), check three orthogonal dimensions:

  • Utilisation — the average fraction of time the resource was busy servicing work (e.g. %util on a disk, us + sy on a CPU).
  • Saturation — the degree to which the resource has more work than it can handle, typically surfaced as queue depth or wait time (e.g. vmstat's r column, iostat's avgqu-sz, run-queue latency).
  • Errors — the count of error events on the resource (dmesg kernel errors, sar -n EDEV interface errors, iostat sector errors, TCP retransmits).

Why it works

USE is complete — it names every dimension at which a resource can be a bottleneck — and it's actionable: each dimension has a known measurement on standard Linux tools. You don't need to guess "is it the disk or the CPU?" — you enumerate every resource and every dimension in order, and by process of elimination exonerate resources and narrow the search space.

The methodology is the organising principle behind the [[patterns/sixty-second-performance-checklist|60-second Linux performance checklist]] at Netflix. Each of the 10 commands (uptime, dmesg | tail, vmstat 1, mpstat -P ALL 1, pidstat 1, iostat -xz 1, free -m, sar -n DEV 1, sar -n TCP,ETCP 1, top) maps to USE cells across CPU / memory / disk / network.

The utilisation-vs-saturation distinction matters

A common anti-pattern is to measure only utilisation and treat high utilisation as "the problem." High utilisation can coexist with unsaturated service (CPU is 99% busy but nothing is queued, so response latency is fine). The load-bearing production signal for bottleneck-identification is saturation — queue depth, wait time, unacknowledged requests — not utilisation. See concepts/cpu-utilization-vs-saturation for the CPU-specific instance and concepts/run-queue-latency for the scheduler-queue-as-saturation-signal formulation.

Look for errors and saturation first

Gregg's explicit sequencing advice: "Look for errors and saturation metrics, as they are both easy to interpret, and then resource utilization." Errors typically point at what broke, saturation at what's overloaded, utilisation at where capacity is going. Errors and saturation have sharper thresholds (was there a TCP retransmit? is the run queue deeper than CPU count?); utilisation is a gradient that requires context.

Extension beyond OS resources

The method generalises. For any resource with a queue — a connection pool, a thread pool, a Kafka topic partition, a DynamoDB shard — you can define utilisation (fraction of slots in use), saturation (queue depth + wait time), and errors (rejections, timeouts). The discipline is the same: check all three for every resource.

Seen in

  • sources/2025-07-29-netflix-linux-performance-analysis-in-60-seconds — canonical post on the wiki. Brendan Gregg (Netflix Performance Engineering) encodes the USE Method as a 10-command shell checklist; every command in the sequence maps to one or more USE cells across CPU / memory / disk / network. Example interpretation rules from the post — %util > 60% (utilisation threshold), avgqu-sz > 1 (saturation), r > CPU count (saturation), dmesg | tail errors scan — show each of the three dimensions in action on Linux /proc-backed tools.
Last updated · 319 distilled / 1,201 read