Skip to content

CONCEPT Cited by 2 sources

Hardware/software co-design

Definition

Hardware/software co-design is the practice of shaping hardware requirements and software design together, as one iterative loop, rather than in sequence. Software workload shape drives hardware decisions (chip selection, chassis design, firmware tuning); hardware constraints drive software decisions (storage layout, concurrency model, placement policy, bandwidth targets).

The opposite default — hardware as commodity input to software — treats servers as a fungible resource and buys off-the-shelf. That works at small to mid-scale; at hyperscale and near-hyperscale, the cost and performance gap between a co-designed platform and a commodity platform is large enough that co-design becomes a structural advantage.

Load-bearing framings

From the software side

  • What workload shape does this generation's software actually have? Bin-packing for containerized services? OLTP with replication-lag- sensitive tails? Append-only log-structured storage? High-fan-out embedding generation? Each shape selects different hardware characteristics.
  • What hardware features would materially move the needle? Ask software teams this before selecting silicon. Dropbox's 7th- gen rollout quotes: "we brought in our software teams early to find out what would actually move the needle."

From the hardware side

  • What constraints does this generation's hardware introduce? Higher-capacity SMR drives → tighter vibration tolerance. Higher-core-count CPUs → higher rack power. Denser chassis → harder cooling. These force software response: append-only filesystems, rack-power-aware placement, co-designed chassis.
  • What firmware / customization can the vendor provide? See patterns/supplier-codevelopment.

Example: Dropbox's 7th-generation rollout

Dropbox's 2025 hardware refresh is end-to-end co-designed. Each platform tier traces back to a software workload whose needs shaped it:

Hardware decision Software driver
systems/crush: 84-core Genoa, 2× RAM Container bin-packing; service density
systems/dexter: single-socket, high clock Dynovault + Edgestore replication lag dominates tails
systems/sonic: vibration/acoustic-tuned chassis 30+ TB SMR drives in systems/magic-pocket require mechanical tolerance
systems/gumby + systems/godzilla: GPU tiers systems/dropbox-dash AI workloads
200 Gbps/chassis SAS topology Storage-software aggregate bandwidth needs
PDU doubling (2→4) Real-world draw exceeds 15 kW budget
400G-ready network Growing east-west traffic

Stated explicitly:

We weren't just designing servers, but building platforms that elevated our services.

Example: Google's Coral NPU at the edge-ML scale

Google Research's 2025-10-15 Coral NPU announcement applies the same methodology at a vastly different power scale. The workload shape ( on-device ML inference in always-on ambient-sensing devices — hearables, AR glasses, smartwatches) drives the hardware decisions (ML matrix engine prioritised over scalar compute — the ML-first architecture stance); the hardware constraints (~512 GOPS at a few milliwatts) drive the software decisions (quantized ML models, RISC-V-compliant toolchain built on open compilers rather than proprietary SDKs) (Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai).

The delivery shape is explicitly co-design-oriented: Coral NPU is shipped as a reference architecture ( reference-hardware-for-software-ecosystem pattern) so the ML software ecosystem has a stable target to build against — attacking the fragmented hardware/software ecosystem problem from both sides simultaneously.

Required preconditions

  1. Scale — co-design is amortized across many servers; at <1000- server scale it's usually not cost-effective.
  2. Multi-generation hardware roadmap — one-off co-design doesn't pay back; the loop needs to run multiple times to amortize the supplier-engagement overhead and the internal software-hardware coordination.
  3. Own-facility or reserved-capacity datacenters — cloud-rented compute prevents co-design by construction; someone else chose the hardware.
  4. Cross-team collaboration discipline — software, hardware, network, datacenter engineering teams have to meet at planning time, not at rollout.

Non-examples

  • Pure off-the-shelf procurement — hardware is an input; software adapts.
  • Pure in-house silicon (AWS Nitro/Graviton, Google TPU, Meta MTIA) — still co-design, but the loop includes designing the chip, not just tuning its firmware + chassis. Co-design as a generic pattern subsumes custom-silicon but doesn't require it.
  • Bringing a single tenant onto reserved hardware — that's hardware affinity, not co-design; co-design requires the design of the hardware to incorporate the tenant's shape.

Relationship to other concepts

Seen in

Last updated · 200 distilled / 1,178 read