CONCEPT Cited by 2 sources

Performance per watt¶

Definition¶

Performance per watt is the ratio of useful work (throughput, operations completed, queries served) to electrical power consumed. As a hardware-selection criterion, it displaces raw performance because at datacenter scale the binding constraint is power delivery and cooling, not silicon performance.

Why it matters at datacenter scale¶

Rack power is capped by facility design (typically 15–30 kW per rack in commodity datacenters, higher in liquid-cooled). A chip that delivers 2× performance at 3× power yields fewer useful cycles per rack than a chip that delivers 1.5× performance at 1.5× power.
Cooling is paid continuously — every watt dissipated is a watt the cooling plant has to evacuate. Higher perf/watt compounds into lower PUE (power usage effectiveness).
TCO dominates CapEx — at multi-year fleet operation, energy costs routinely exceed server purchase cost. Perf/watt directly reduces TCO.

Dropbox's 7th-gen CPU selection¶

Dropbox evaluated 100+ processors for their 7th-gen compute tier, using four criteria:

Maximum system throughput.
Minimum process latency.
Best price-performance for Dropbox workloads.
Balanced I/O + memory bandwidth.

Benchmarking: SPECintrate, cross-checked with performance per watt and performance per core. The 84-core AMD EPYC 9634 "Genoa" won both axes — not just raw performance.

Paired with perf/core, perf/watt prevents picking an efficient chip that underperforms per-thread (relevant because many Dropbox workloads are latency-sensitive, not just throughput-sensitive). The joint criterion is both high perf/watt and competitive perf/core, not either alone.

Perf/watt as a co-design signal¶

Once perf/watt is the binding criterion, software choices flow from it:

Containerize for density — perf/watt only pays if the server is fully utilized. Bin-packing workloads across the 84 cores/socket converts perf/watt gains into usable throughput.
Co-design chassis cooling — a chip at X watts delivered through a chassis that can actually remove those watts beats the same chip in a chassis that throttles under sustained load.
Model real-world draw, not nameplate — nameplate TDP overestimates; sustained workload TDP is what perf/watt is evaluated at.

Relationship to rack-level power density¶

Perf/watt is per-chip. concepts/rack-level-power-density is per-rack. They compose: high perf/watt at the chip level converts to high useful-work-per-rack only if you can deliver that chip's power to every socket in the rack. When Dropbox's real-world-draw modeling showed 16 kW/rack exceeded the 15 kW budget, doubling PDUs (see patterns/pdu-doubling-for-power-headroom) is how perf/watt at the chip translated into useful throughput at the rack.

What this concept does not claim¶

Not "efficiency at any cost." Perf/watt alone would favor low- frequency, high-core-count chips; pair with perf/core to keep per-thread latency competitive.
Not a replacement for specific workload benchmarks. Perf/watt on a benchmark ≠ perf/watt on your workload. The codesign loop (see concepts/hardware-software-codesign) is what grounds generic perf/watt in workload-specific perf/watt.

Perf/watt at the other end of the scale: ambient-sensing edge ML¶

Perf/watt isn't only a datacenter metric. At the opposite end of the scale — always-on ambient sensing devices (hearables, AR glasses, smartwatches, always-on IoT sensors) — perf/watt is the binding metric for exactly the opposite reason: the power ceiling is a few milliwatts, sustained, because the device is battery-powered and thermally constrained against skin contact.

Google Research's 2025-10-15 Coral NPU announcement expresses the design target directly as perf/watt: "~512 GOPS [ML compute] while consuming just a few milliwatts". At this envelope peak GOPS / TOPS is marketing; what the device class can actually use is sustained operations per milliwatt. The ML-first architecture stance Coral NPU advocates — matrix engine first, scalar compute secondary — is an architectural response to perf/watt at mW-class power (Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai).

Seen in¶

sources/2025-08-08-dropbox-seventh-generation-server-hardware — explicit CPU-selection criterion; paired with perf/core to avoid favouring efficiency at per-thread latency cost. Datacenter-scale instance.
sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai — perf/watt as the design target for an always-on ambient-sensing NPU (Coral NPU: ~512 GOPS at a few milliwatts). Edge-scale instance; forces the ML-first chip-design stance.