Skip to content

PATTERN Cited by 1 source

Modular rack for multi-accelerator

Context

Hyperscale AI infrastructure must:

  1. Absorb silicon generations faster than the data-hall lifecycle (chassis ~7 years, silicon ~2 years).
  2. Serve multiple accelerator vendors — both for supply resilience and for workload/silicon fit (training vs inference, large-model vs small-model).
  3. Preserve fully integrated system design (unified power, control, compute, fabric) so that deployment remains rapid and reliable at fleet scale.

A naive approach — one chassis per silicon generation per vendor — multiplies engineering cost, operations complexity, and supply risk.

The pattern

Design a single platform chassis with standardised accelerator module slots (via OCP OAM) + standardised host/power/fabric integration; hold the chassis integration stable across silicon generations and across accelerator vendors.

The chassis becomes the stable substrate; the accelerator module is the variable element.

Meta's 2024-10 instances

Grand Teton — NVIDIA H100 → AMD MI300X

"Like its predecessors, this new version of Grand Teton features a single monolithic system design with fully integrated power, control, compute, and fabric interfaces. This high level of integration simplifies system deployment, enabling rapid scaling with increased reliability for large-scale AI inference workloads." (Source: sources/2024-10-15-meta-metas-open-ai-hardware-vision)

The Grand Teton platform — 2022 Zion-EX successor, originally NVIDIA-GPU-only — is extended in 2024-10 to host the AMD Instinct MI300X. Same monolithic-integration principle, new accelerator.

Catalina — NVIDIA GB200 Blackwell, liquid-cooled ORv3

"We aim for Catalina's modular design to empower others to customize the rack to meet their specific AI workloads while leveraging both existing and emerging industry standards." (Source: sources/2024-10-15-meta-metas-open-ai-hardware-vision)

Catalina extends the modular-rack-for-multi-accelerator principle into the Blackwell generation + 140 kW liquid-cooled regime. The chassis modularity is now specified at rack-scale, not just platform-scale.

Preconditions

  • Standardised accelerator module form factorOAM is the OCP standard that makes this feasible.
  • Consistent host/accelerator interface — PCIe, NVLink/InfiniBand interconnect, and power delivery must remain compatible across silicon generations.
  • Vendor willingness to ship to the open standard — NVIDIA H100-SXM, AMD MI300X-OAM, and NVIDIA GB200 rack-scale solution are all publicly announced variants.

Trade-offs

  • Chassis must over-engineer for multiple use cases. Single-vendor chassis can be tuned precisely for one GPU; a multi-accelerator chassis has to accept some generic-design overhead.
  • Thermal envelope must be set for the hottest supported silicon. Cooling design must handle MI300X or H100 or GB200 without redesign — or accept that a new rack generation (Catalina vs Grand Teton) is needed for a silicon generation beyond the chassis's envelope.
  • Supply-chain simpler at the chassis level, still per-silicon at the accelerator level. The pattern shifts where supply risk lives, doesn't eliminate it.
Last updated · 319 distilled / 1,201 read