Skip to content

CONCEPT Cited by 2 sources

Performance per watt

Definition

Performance per watt is the ratio of useful work (throughput, operations completed, queries served) to electrical power consumed. As a hardware-selection criterion, it displaces raw performance because at datacenter scale the binding constraint is power delivery and cooling, not silicon performance.

Why it matters at datacenter scale

  • Rack power is capped by facility design (typically 15–30 kW per rack in commodity datacenters, higher in liquid-cooled). A chip that delivers 2× performance at 3× power yields fewer useful cycles per rack than a chip that delivers 1.5× performance at 1.5× power.
  • Cooling is paid continuously — every watt dissipated is a watt the cooling plant has to evacuate. Higher perf/watt compounds into lower PUE (power usage effectiveness).
  • TCO dominates CapEx — at multi-year fleet operation, energy costs routinely exceed server purchase cost. Perf/watt directly reduces TCO.

Dropbox's 7th-gen CPU selection

Dropbox evaluated 100+ processors for their 7th-gen compute tier, using four criteria:

  1. Maximum system throughput.
  2. Minimum process latency.
  3. Best price-performance for Dropbox workloads.
  4. Balanced I/O + memory bandwidth.

Benchmarking: SPECintrate, cross-checked with performance per watt and performance per core. The 84-core AMD EPYC 9634 "Genoa" won both axes — not just raw performance.

Paired with perf/core, perf/watt prevents picking an efficient chip that underperforms per-thread (relevant because many Dropbox workloads are latency-sensitive, not just throughput-sensitive). The joint criterion is both high perf/watt and competitive perf/core, not either alone.

Perf/watt as a co-design signal

Once perf/watt is the binding criterion, software choices flow from it:

  • Containerize for density — perf/watt only pays if the server is fully utilized. Bin-packing workloads across the 84 cores/socket converts perf/watt gains into usable throughput.
  • Co-design chassis cooling — a chip at X watts delivered through a chassis that can actually remove those watts beats the same chip in a chassis that throttles under sustained load.
  • Model real-world draw, not nameplate — nameplate TDP overestimates; sustained workload TDP is what perf/watt is evaluated at.

Relationship to rack-level power density

Perf/watt is per-chip. concepts/rack-level-power-density is per-rack. They compose: high perf/watt at the chip level converts to high useful-work-per-rack only if you can deliver that chip's power to every socket in the rack. When Dropbox's real-world-draw modeling showed 16 kW/rack exceeded the 15 kW budget, doubling PDUs (see patterns/pdu-doubling-for-power-headroom) is how perf/watt at the chip translated into useful throughput at the rack.

What this concept does not claim

  • Not "efficiency at any cost." Perf/watt alone would favor low- frequency, high-core-count chips; pair with perf/core to keep per-thread latency competitive.
  • Not a replacement for specific workload benchmarks. Perf/watt on a benchmark ≠ perf/watt on your workload. The codesign loop (see concepts/hardware-software-codesign) is what grounds generic perf/watt in workload-specific perf/watt.

Perf/watt at the other end of the scale: ambient-sensing edge ML

Perf/watt isn't only a datacenter metric. At the opposite end of the scale — always-on ambient sensing devices (hearables, AR glasses, smartwatches, always-on IoT sensors) — perf/watt is the binding metric for exactly the opposite reason: the power ceiling is a few milliwatts, sustained, because the device is battery-powered and thermally constrained against skin contact.

Google Research's 2025-10-15 Coral NPU announcement expresses the design target directly as perf/watt: "~512 GOPS [ML compute] while consuming just a few milliwatts". At this envelope peak GOPS / TOPS is marketing; what the device class can actually use is sustained operations per milliwatt. The ML-first architecture stance Coral NPU advocates — matrix engine first, scalar compute secondary — is an architectural response to perf/watt at mW-class power (Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai).

Seen in

Last updated · 200 distilled / 1,178 read