CONCEPT Cited by 3 sources
Liquid-cooled AI rack¶
Definition¶
A liquid-cooled AI rack is a rack-scale compute unit where heat from the GPUs/accelerators is removed via a liquid coolant loop (water, dielectric fluid, or two-phase) rather than by forced-air convection. Liquid cooling becomes necessary at rack-level power envelopes above roughly ~30–50 kW — beyond the sustained air-cooling envelope even with aggressive CRAH and hot-aisle containment.
Why AI racks force the transition¶
Per-GPU TDP has risen from H100-class ~700 W to Blackwell-class ~1 kW+. At 36–72 GPUs per rack plus host CPUs, switches, and NICs, rack-level power envelopes exceed 100 kW — firmly past the air-cooling limit.
Meta's 2024-06 Grand Teton H100 training post explicitly named the cooling-infrastructure constraint:
"Since we did not have time to change the cooling infrastructure, we had to remain in an air-cooled environment. The mechanical design had to change… All of these hardware-related changes were challenging because we had to find a solution that fit within the existing resource constraints."
Four months later, Meta's 2024-10 OCP AI-hardware vision post announces the break: Catalina at 140 kW liquid-cooled, built on the ORv3 HPR chassis.
The power-density shift¶
| Deployment | Cooling | Rack envelope |
|---|---|---|
| Dropbox 7th-gen storage (2025-08) | Air | ~16 kW |
| Meta Grand Teton H100 (2024-06) | Air | ~40 kW est. |
| Meta Catalina GB200 (2024-10) | Liquid | 140 kW |
(Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware / sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale / sources/2024-10-15-meta-metas-open-ai-hardware-vision)
Infrastructure implications¶
Transitioning a data hall to liquid cooling is not incremental — it requires:
- Coolant distribution units (CDUs) — pump + heat-exchanger units that interface with the facility chilled-water loop.
- Rack manifolds — distribute coolant to each compute tray; contain leak-detection sensors.
- Quick-disconnect fittings — allow hot-swap of compute trays without draining the loop.
- Facility-level retrofits — chilled-water supply, leak containment, redundancy.
This is why Meta could not "just switch" for the H100 generation — the facility constraint had to be retired through capex-scale redesign, not firmware update.
Seen in¶
- sources/2024-10-15-meta-metas-open-ai-hardware-vision — Catalina at 140 kW; Meta's break from air-cooled AI racks.
- sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale — the explicit air-cooling-constraint framing that Catalina later relaxes.
- sources/2025-08-08-dropbox-seventh-generation-server-hardware — complementary storage-rack datum at the opposite (~16 kW air) end of the power-density spectrum.
Related¶
- concepts/rack-level-power-density — the binding constraint liquid cooling lifts.
- concepts/heat-management — thermal-engineering counterpart at the broader discipline level.
- systems/catalina-rack — canonical wiki liquid-cooled AI rack.
- systems/orv3-rack — the HPR chassis specification.
- systems/grand-teton — the air-cooled predecessor.
- systems/nvidia-gb200-grace-blackwell — the silicon forcing the transition.
- companies/meta.