Skip to content

SYSTEM Cited by 1 source

Catalina (Meta OCP AI rack)

Catalina is Meta's next-generation high-powered AI rack, announced at OCP Global Summit 2024. It is built on the NVIDIA Blackwell platform full rack-scale solution, supporting the NVIDIA GB200 Grace Blackwell Superchip, and is the successor in Meta's hardware lineage to the air-cooled Grand Teton platform that underpinned the two 24K-GPU H100 training clusters.

Configuration

Catalina introduces the ORv3 (Open Rack v3) high-power rack (HPR), "capable of supporting up to 140kW". The full solution is liquid-cooled (unlike Grand Teton's air-cooled 700 W H100 configuration) and consists of:

  • A power shelf feeding a compute tray
  • A switch tray
  • The ORv3 HPR chassis
  • The Wedge 400 fabric switch
  • A management switch
  • A battery backup unit
  • A rack management controller

Design principles

"We aim for Catalina's modular design to empower others to customize the rack to meet their specific AI workloads while leveraging both existing and emerging industry standards." (Source: sources/2024-10-15-meta-metas-open-ai-hardware-vision)

Two stated principles:

  • Modularity — consumers of the design can swap compute trays, fabric switches, and cooling stages to match their workload.
  • Flexibility — multiple accelerator silicon generations should be accommodatable.

Positioning in Meta's AI-hardware lineage

Catalina represents the break from the "data-center cooling infrastructure cannot change quickly" constraint named in the Grand-Teton-H100 2024-06 post. The rack-level power density jumps from Grand-Teton's air-cooled ≤ ~40 kW envelope to 140 kW via a full-liquid redesign.

Seen in

Why it matters

  • First Meta AI rack > 100 kW. Canonical wiki instance of the > 100 kW liquid-cooled AI rack shape; complements concepts/rack-level-power-density's 16 kW air-cooled Dropbox datum at the opposite end of the power-density spectrum.
  • OCP-contributed. Catalina is being contributed to OCP, which means the design propagates to other hyperscalers/NCPs and is not Meta-proprietary.
  • Blackwell-generation proof point. Catalina is one of the first publicly-detailed rack-scale Blackwell platforms outside NVIDIA reference designs.
Last updated · 319 distilled / 1,201 read