DROPBOX 2025-08-08 Tier 2

Seventh-generation server hardware at Dropbox: our most efficient and capable architecture yet¶

Summary¶

Dropbox's seventh-generation in-house server hardware — replacing the 2020-era sixth-gen Cartman platform — rolled out across five named tiers: Crush (compute), Dexter (database), Sonic (storage), Gumby (mixed-workload GPU), and Godzilla (dense multi-GPU). The refresh is built around three forcing functions: embrace the 2024-era step changes in CPU core counts, 200/400G networking, and HDD areal density; co-develop with suppliers instead of buying off-the-shelf; and bring software teams into hardware decisions at the requirements stage. The headline moves are a 48-core → 84-core AMD EPYC (Rome → Genoa) CPU swap with DDR4 → DDR5 and 25G → 100G NIC (Crush: ~40% SPECintrate gain, +75% cores/socket, 2× RAM/server, same 1U "pizza box", 46 servers/rack); a dual-socket → single-socket Dexter shift with +30% IPC and base clock 2.1 → 3.25 GHz driving up to 3.57× less replication lag on Dynovault and Edgestore; a vibration- and acoustically-tuned storage chassis co-designed with drive vendors that enabled early adoption of Western Digital Ultrastar HC690 32 TB 11-platter SMR drives (~10%+ capacity bump/gen); a SAS topology rework to hit >200 Gbps per chassis vs an internal floor of 30 Gbps/PB; and, at the facility level, 2 → 4 PDUs per rack (reusing existing busways + new receptacles) to lift the per-rack power envelope from 15 kW to >16 kW real-world draw and open headroom for future accelerators. GPUs arrive as two new tiers: Gumby (Crush-derived, 75–600 W TDP envelope, HHHL + FHFL PCIe, for video transcoding / embeddings / inference) and Godzilla (up to 8 interconnected GPUs, for LLM fine-tuning / high-throughput ML training). Dropbox operates roughly tens of thousands of servers with millions of drives, has grown from 40 PB (2012) → 600 PB (2016) → "exabyte era", runs >99% of its storage fleet on SMR, and owns >90% of it in self-operated datacenters since the 2015 Magic Pocket migration. Looking forward, the post names HAMR (heat-assisted magnetic recording) and liquid cooling as the next technology steps.

Key takeaways¶

Hardware generations are a forcing function for software co-design, not the other way around. Every section of the post names which software team's workload shaped which hardware decision: Dynovault and Edgestore drove the dual-→single-socket Dexter shift (replication lag dominates their tail); containerized services drove Crush's core-count doubling (bin-packing efficiency); systems/dropbox-dash and video processing drove Gumby + Godzilla existing at all; storage software drove the 30 Gbps/PB → 200 Gbps/chassis SAS-topology rework. Stated explicitly: "we weren't just designing servers, but building platforms that elevated our services." Formalized here as concepts/hardware-software-codesign. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
Supplier co-development is a strategic lever, not a procurement detail. Across storage (vibration/acoustic chassis), compute (firmware tuning, heatsink/airflow), and the SMR 32 TB drive (first-mover on Ultrastar HC690), Dropbox's story is the same: give suppliers your workload, get early access and firmware customization back. This is a different primitive from "buy off the shelf" and a different primitive from "build it ourselves": it's a long-term co-investment that converts supplier-roadmap position into earlier hardware capability. Formalized as patterns/supplier-codevelopment. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
Thermals and power, not silicon, are the new bottlenecks. Quoted near-verbatim: "No matter where we looked — compute, storage, or GPU platforms — one thing was clear: power demands are going up." Dropbox's solution was to cap processor TDP per server so they could pack maximum cores into the existing rack power envelope, model real-world draw not nameplate (nameplate overestimates), and — when real-world still exceeded 15 kW/rack — double PDUs from 2 → 4 using existing busways + new receptacles rather than rebuild the facility. Power consumption per petabyte and per core still decreased. Formalized as concepts/rack-level-power-density and patterns/pdu-doubling-for-power-headroom. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
Higher-capacity drives tighten the acoustic/vibration envelope, not just the IOPS-per-TB envelope. The concepts/hard-drive-physics framing from the 2025 Warfield/S3 post argues capacity-per-drive scales exponentially while IOPS-per-drive stays roughly constant (~120 IOPS/drive, flat since 2006). Dropbox adds a second structural constraint: as drives hit 30+ TB, the read/write head's nanometer precision leaves vanishing margin against the vibration of 10k-RPM fans packed into a denser chassis. Vibration induces position error signal (PES) events; worst case, a write fault → drive retry → latency spike + IOPS drop. Meanwhile drives age fastest above ~40 °C, so you can't just slow the fans. The co-developed chassis explicitly trades fan-curve tuning + airflow redirection + acoustic damping against this axis. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
SMR is now the dominant format of Dropbox's storage fleet — >99%. This is a major industry data point: shingled magnetic recording started as an experimental tier for cold data and has, at Dropbox specifically, eaten the generalist storage workload entirely. The post cites their 2022 four-years-of-SMR retrospective which charted the 25% → 99% migration. SMR's higher density is what made the 32 TB Ultrastar HC690 viable; SMR is also what narrows vibration tolerance, linking this takeaway directly to (4). (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
Compute and database platforms converged onto a single system vendor platform. Early in Crush/Dexter design Dropbox realized the requirements overlapped enough to reuse one vendor platform for both tiers, simplifying components, firmware, drivers, and OS updates. Dexter differentiates via a single-socket (vs Crush's dual-socket) SKU — eliminating inter-socket communication latency for databases where replication lag is the dominant tail driver — while sharing everything else. A consolidation move framed as operational-complexity reduction at fleet scale, not as a hardware-architecture novelty. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
Storage throughput engineering is a bandwidth-per-capacity problem, not a per-drive problem. Internal floor: 30 Gbps per PB of data. Expected future systems: >100 Gbps per PB. Design target: >200 Gbps per chassis. This inverts the usual per-drive IOPS/throughput framing; Dropbox cares about whether the aggregate drives in a chassis can deliver proportional bandwidth as capacity climbs, which means the SAS topology (how drives attach to the HBA/expander) becomes the scaling axis, not the drive interface per se. Paired with a new 400G-ready datacenter fabric at the network side (see this earlier post). (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
GPU tiers are split into "flexibility" and "density". Gumby is Crush + PCIe GPU slots with an intentionally wide TDP envelope (75–600 W) and both HHHL and FHFL form factors — optimized for mixed inference / embeddings / transcoding workloads that vary widely in accelerator sizing. Godzilla is dense multi-GPU (up to 8, interconnected) for LLM training and fine-tuning. The split encodes a general design principle: accelerator platforms should be planned as product tiers keyed to workload shape, not as a single "GPU server" SKU. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
"Real-world modeling beats nameplate budgeting" is an actionable methodology. Dropbox models actual server draw (~16 kW/cabinet under the new workload mix) rather than the manufacturer nameplate (routinely overestimates). That number is what triggered the 4-PDU move. Generalizes: any capacity-planning step that consumes a nameplate max as a hard budget number systematically over-provisions facility power; a workload-shape-aware model unlocks 10–20% headroom. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)
The next-gen roadmap names two forcing functions: HAMR and liquid cooling. Heat-assisted magnetic recording will push areal density further but will tighten the acoustic/thermal envelope still more — reinforcing the co-developed-chassis direction. Liquid cooling moves from "niche" to "necessity" as compute densities climb past the ~600 W TDP point Gumby already supports. Both signal that facility-level primitives (cooling medium, power density, rack form factor) become first-class design variables for future generations, not just variables downstream of chip choice. (Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware)

Named platforms and numbers¶

Tier	Workload	Key spec changes vs gen-6	Cited numbers
Crush	Compute	AMD EPYC 7642 Rome 48c → EPYC 9634 Genoa 84c; DDR4 256 GB → DDR5 512 GB; 25G → 100G NIC; NVMe gen5; same 1U chassis, 46 servers/rack	+75% cores/socket, 2× RAM, ~40% SPECintrate gain
Dexter	Database	Same core count as gen-6; dual-socket → single-socket; +30% IPC; 2.1 → 3.25 GHz base clock	Up to 3.57× less replication lag on Dynovault and Edgestore
Sonic	Storage	Co-developed chassis with vibration/acoustic damping, redirected-airflow fan design, SAS topology rework; first-mover on Ultrastar HC690 32 TB SMR (11 platters, 3.5")	>200 Gbps/chassis (design target); >10% capacity/gen
Gumby	GPU (mixed)	Crush-based + PCIe GPU slots	75–600 W TDP envelope; HHHL + FHFL form factors
Godzilla	GPU (dense)	Up to 8 interconnected GPUs	—

Facility-level: - Per-rack power: 15 kW → ~16+ kW real-world draw supported; PDUs 2 → 4 per rack using existing busways - Power consumption per petabyte and per core decreased even as total rack power increased - Network: new 400G-ready DC architecture

Scale context: - Tens of thousands of servers, millions of drives - 40 PB (2012) → 600 PB (2016) → exabyte era (2025) - Since 2015 Magic Pocket migration, >90% of stored data on Dropbox-managed hardware - >99% of storage fleet on SMR

Architecture bits worth extracting¶

The CPU-selection loop¶

Dropbox evaluated 100+ processors, filtered by four criteria: maximum system-level throughput, minimum latency for individual processes, best price/performance for Dropbox-specific workloads, and balanced I/O + memory bandwidth. Ran SPECintrate, compared perf/watt and perf/core. "Balanced" is load-bearing — not just raw core counts; an 84-core chip that starves for memory bandwidth would fail criterion four. The 84-core Genoa won both max-throughput and strong-per-core-performance axes.

The single-socket database argument¶

Dual-socket systems pay inter-socket latency on every cache coherence miss that crosses sockets. For OLTP-shaped databases (write on primary, replicate to secondary), that latency shows up directly as replication lag. Going single-socket, combined with the +30% IPC and the higher base clock, compounded into the 3.57× replication-lag reduction. Named beneficiaries: Dynovault and Edgestore.

The vibration/acoustic design brief¶

Drive head operates with nanometer precision over a flying gap roughly two sheets of paper thick (cf concepts/hard-drive-physics — Warfield's 747-over-grass analogy). Fan RPM >10k, denser chassis → denser vibration coupling. PES = position error signal; cumulative PES → write fault → drive retry → latency + IOPS degradation. Drive temperature sweet spot ~40 °C (too cold = you can run fans slower for less vibration but you haven't actually saved anything on reliability; too hot = drives age faster and error rates rise). Co-developed chassis addresses: (1) vibration control via acoustical isolation and damping, (2) thermals via fan control + airflow redirection, (3) future-proofing for next-gen drive form factors. This is an instance of concepts/heat-management applied at the chassis/mechanical level rather than at the multi-tenant-placement level S3 operates at — complementary framings.

The PDU-doubling trick¶

Conventional move: higher power budget requires facility rework (bigger busways, new wiring, possibly new cabinets). Dropbox's move: keep the existing busways, add more receptacles, run more PDUs per rack. 2 → 4. Effectively doubles deliverable power without rebuilding the facility. Tradeoffs not fully enumerated in the post — presumably rack density and cable management both got tighter — but the outcome is 16 kW served from infrastructure that was nominally a 15 kW facility. Generalizes as patterns/pdu-doubling-for-power-headroom.

Systems introduced or surfaced¶

systems/magic-pocket — Dropbox's in-house block storage, the 2015 Amazon S3 exit destination; operates the hardware described here at exabyte scale, >99% on SMR.
systems/smr-drives — Shingled magnetic recording; >99% of Dropbox's storage fleet; enables higher-density drives like the 32 TB Ultrastar HC690, at the cost of track-overlap-driven write-amplification for random writes (hence the filesystem+workload shaping on top).
systems/crush — 7th-gen compute platform; 84-core Genoa in 1U.
systems/dexter — 7th-gen database platform; single-socket Genoa; same vendor platform as Crush.
systems/sonic — 7th-gen storage platform; co-developed vibration/thermal chassis for 30+ TB SMR.
systems/gumby — 7th-gen flexible GPU tier.
systems/godzilla — 7th-gen dense multi-GPU tier.
systems/dropbox-dash — The AI product whose workload shape forced the GPU tiers to exist.

Concepts surfaced¶

concepts/hardware-software-codesign — Naming the practice: hardware requirements gathered from software teams before silicon selection, software workload shape fed into chassis/firmware tuning. Dropbox's 7th-gen rollout is an end-to-end instance.
concepts/performance-per-watt — Explicit selection criterion in Dropbox's CPU-down-select, not raw performance; paired with per-core perf to avoid picking an energy-efficient chip that underperforms per-thread.
concepts/rack-level-power-density — The actual scarce resource: kW/rack, not kW/server. Dropbox models real-world draw against the facility-level budget and adapts the power-distribution topology to fit.
concepts/hard-drive-physics (existing) — Second source confirming the Warfield/S3 framing; Dropbox adds the vibration-envelope constraint at 30+ TB that Warfield's IOPS/capacity framing doesn't cover.
concepts/heat-management (existing) — Extends the concept from S3's multi-tenant-placement framing to Dropbox's mechanical/chassis framing: same concept applied at a different layer of the stack.

Patterns surfaced¶

patterns/supplier-codevelopment — Long-horizon supplier relationship as a hardware-capability lever: workload telemetry → supplier firmware/hardware customization → early access. Dropbox on storage chassis, on SMR 32 TB drive, on compute firmware. Complements — does not replace — patterns/hackathon-to-platform (own-everything) or off-the-shelf procurement.
patterns/pdu-doubling-for-power-headroom — When per-rack power budget is the bottleneck and facility rebuild isn't on the table, duplicate PDU count per rack on existing busways. Concrete 2 → 4 PDU move at Dropbox; reusable template.

Caveats¶

Self-reported and promotional-tone — dropbox.tech publishes post hoc; no independent benchmarks of Crush/Dexter/Sonic vs gen-6, just the 40% SPECintrate / 3.57× replication lag / >10% capacity gain figures Dropbox publishes.
No cost numbers. "Better performance per watt" and "lowers cost per terabyte" are stated qualitatively. TCO modeling not disclosed.
GPU tier details are sparse. Named SKUs behind Gumby and Godzilla (H100? MI300? L40S?) aren't given; the 75–600 W envelope suggests broad NVIDIA SKU coverage but doesn't pin it.
Benchmarking methodology not disclosed. SPECintrate is an industry standard but the "3.57× less replication lag" figure is Dropbox-internal; workload and measurement window not specified.
The "maintenance burden" side of consolidating Crush + Dexter onto one vendor platform is asserted, not quantified. Simpler ops is plausible; specific incident-reduction / MTTR numbers not given.
"PDU doubling" generalization untested — Dropbox's facility had unused busway capacity and receptacle density to absorb 2 → 4 PDUs. Not every datacenter does. The move generalizes; the precondition (existing headroom in the power distribution) limits its applicability.

Links¶

Raw: raw/dropbox/2025-08-08-dropbox-announces-new-gen-server-hardware-for-higher-efficie-623a1dc3.md
URL: https://dropbox.tech/infrastructure/seventh-generation-server-hardware
Prior Dropbox hardware post: sixth-generation server hardware
Magic Pocket origin: Inside the Magic Pocket (2016)
SMR retrospective: Four years of SMR storage (2022)
400G network: From AI to sustainability: why our latest datacenters use 400G networking
HN: 46 points, https://news.ycombinator.com/item?id=44835703