SYSTEM Cited by 1 source
NVIDIA GB200 Grace Blackwell Superchip¶
The NVIDIA GB200 Grace Blackwell Superchip is NVIDIA's Blackwell-generation rack-scale AI platform, combining Blackwell GPUs with the Grace Arm-based CPU on a single module via NVLink-C2C. It is the silicon target that Meta's Catalina rack is built around at OCP Summit 2024.
Context on this wiki¶
The GB200 succeeds the NVIDIA H100 (Hopper) generation, which itself is the substrate under Meta's two 24K-GPU training clusters for Llama 3 (see sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale). The Blackwell generation's rack-scale posture — unified Grace-CPU + Blackwell-GPU packaging, higher per-rack GPU counts, liquid cooling required for sustained load — is what drove Meta's redesign to the 140 kW ORv3 HPR.
Seen in (wiki)¶
- Meta Catalina (2024-10). Catalina is "based on the NVIDIA Blackwell platform full rack-scale solution, with a focus on modularity and flexibility. It is built to support the latest NVIDIA GB200 Grace Blackwell Superchip, ensuring it meets the growing demands of modern AI infrastructure." (Source: sources/2024-10-15-meta-metas-open-ai-hardware-vision)
Why it matters¶
- Rack-scale unit, not chip-scale unit. Prior generations (H100) were reasoned about per-GPU and per-node; GB200 designs assume the rack as the unit of deployment, with a rack-level solution from NVIDIA.
- Liquid cooling as table-stakes. GB200 racks' power envelope pushes past air-cooled limits; the generational shift to liquid-cooled infrastructure (Catalina at 140 kW) is a hardware-design forcing function.
- Unified Arm-CPU + GPU. Grace-CPU + Blackwell-GPU on one Superchip changes the CPU/GPU trust and memory model relative to x86-host-plus-PCIe-GPU architectures.
Related¶
- systems/catalina-rack — Meta's OCP-contributed rack design for GB200.
- systems/nvidia-h100 — the preceding-generation GPU still in production training clusters at Meta.
- systems/grand-teton — the platform that housed the H100 generation.
- systems/nvlink — the intra-node interconnect also used at intra-Superchip level.
- systems/amd-instinct-mi300x — the AMD competitor now supported on Grand Teton.
- companies/meta.