CONCEPT Cited by 2 sources
Performance per watt¶
Definition¶
Performance per watt is the ratio of useful work (throughput, operations completed, queries served) to electrical power consumed. As a hardware-selection criterion, it displaces raw performance because at datacenter scale the binding constraint is power delivery and cooling, not silicon performance.
Why it matters at datacenter scale¶
- Rack power is capped by facility design (typically 15–30 kW per rack in commodity datacenters, higher in liquid-cooled). A chip that delivers 2× performance at 3× power yields fewer useful cycles per rack than a chip that delivers 1.5× performance at 1.5× power.
- Cooling is paid continuously — every watt dissipated is a watt the cooling plant has to evacuate. Higher perf/watt compounds into lower PUE (power usage effectiveness).
- TCO dominates CapEx — at multi-year fleet operation, energy costs routinely exceed server purchase cost. Perf/watt directly reduces TCO.
Dropbox's 7th-gen CPU selection¶
Dropbox evaluated 100+ processors for their 7th-gen compute tier, using four criteria:
- Maximum system throughput.
- Minimum process latency.
- Best price-performance for Dropbox workloads.
- Balanced I/O + memory bandwidth.
Benchmarking: SPECintrate, cross-checked with performance per watt and performance per core. The 84-core AMD EPYC 9634 "Genoa" won both axes — not just raw performance.
Paired with perf/core, perf/watt prevents picking an efficient chip that underperforms per-thread (relevant because many Dropbox workloads are latency-sensitive, not just throughput-sensitive). The joint criterion is both high perf/watt and competitive perf/core, not either alone.
Perf/watt as a co-design signal¶
Once perf/watt is the binding criterion, software choices flow from it:
- Containerize for density — perf/watt only pays if the server is fully utilized. Bin-packing workloads across the 84 cores/socket converts perf/watt gains into usable throughput.
- Co-design chassis cooling — a chip at X watts delivered through a chassis that can actually remove those watts beats the same chip in a chassis that throttles under sustained load.
- Model real-world draw, not nameplate — nameplate TDP overestimates; sustained workload TDP is what perf/watt is evaluated at.
Relationship to rack-level power density¶
Perf/watt is per-chip. concepts/rack-level-power-density is per-rack. They compose: high perf/watt at the chip level converts to high useful-work-per-rack only if you can deliver that chip's power to every socket in the rack. When Dropbox's real-world-draw modeling showed 16 kW/rack exceeded the 15 kW budget, doubling PDUs (see patterns/pdu-doubling-for-power-headroom) is how perf/watt at the chip translated into useful throughput at the rack.
What this concept does not claim¶
- Not "efficiency at any cost." Perf/watt alone would favor low- frequency, high-core-count chips; pair with perf/core to keep per-thread latency competitive.
- Not a replacement for specific workload benchmarks. Perf/watt on a benchmark ≠ perf/watt on your workload. The codesign loop (see concepts/hardware-software-codesign) is what grounds generic perf/watt in workload-specific perf/watt.
Perf/watt at the other end of the scale: ambient-sensing edge ML¶
Perf/watt isn't only a datacenter metric. At the opposite end of the scale — always-on ambient sensing devices (hearables, AR glasses, smartwatches, always-on IoT sensors) — perf/watt is the binding metric for exactly the opposite reason: the power ceiling is a few milliwatts, sustained, because the device is battery-powered and thermally constrained against skin contact.
Google Research's 2025-10-15 Coral NPU announcement expresses the design target directly as perf/watt: "~512 GOPS [ML compute] while consuming just a few milliwatts". At this envelope peak GOPS / TOPS is marketing; what the device class can actually use is sustained operations per milliwatt. The ML-first architecture stance Coral NPU advocates — matrix engine first, scalar compute secondary — is an architectural response to perf/watt at mW-class power (Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai).
Seen in¶
- sources/2025-08-08-dropbox-seventh-generation-server-hardware — explicit CPU-selection criterion; paired with perf/core to avoid favouring efficiency at per-thread latency cost. Datacenter-scale instance.
- sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai — perf/watt as the design target for an always-on ambient-sensing NPU (Coral NPU: ~512 GOPS at a few milliwatts). Edge-scale instance; forces the ML-first chip-design stance.