CONCEPT Cited by 3 sources

Liquid-cooled AI rack¶

Definition¶

A liquid-cooled AI rack is a rack-scale compute unit where heat from the GPUs/accelerators is removed via a liquid coolant loop (water, dielectric fluid, or two-phase) rather than by forced-air convection. Liquid cooling becomes necessary at rack-level power envelopes above roughly ~30–50 kW — beyond the sustained air-cooling envelope even with aggressive CRAH and hot-aisle containment.

Why AI racks force the transition¶

Per-GPU TDP has risen from H100-class ~700 W to Blackwell-class ~1 kW+. At 36–72 GPUs per rack plus host CPUs, switches, and NICs, rack-level power envelopes exceed 100 kW — firmly past the air-cooling limit.

Meta's 2024-06 Grand Teton H100 training post explicitly named the cooling-infrastructure constraint:

"Since we did not have time to change the cooling infrastructure, we had to remain in an air-cooled environment. The mechanical design had to change… All of these hardware-related changes were challenging because we had to find a solution that fit within the existing resource constraints."

Four months later, Meta's 2024-10 OCP AI-hardware vision post announces the break: Catalina at 140 kW liquid-cooled, built on the ORv3 HPR chassis.

The power-density shift¶

Deployment	Cooling	Rack envelope
Dropbox 7th-gen storage (2025-08)	Air	~16 kW
Meta Grand Teton H100 (2024-06)	Air	~40 kW est.
Meta Catalina GB200 (2024-10)	Liquid	140 kW

(Source: sources/2025-08-08-dropbox-seventh-generation-server-hardware / sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale / sources/2024-10-15-meta-metas-open-ai-hardware-vision)

Infrastructure implications¶

Transitioning a data hall to liquid cooling is not incremental — it requires:

Coolant distribution units (CDUs) — pump + heat-exchanger units that interface with the facility chilled-water loop.
Rack manifolds — distribute coolant to each compute tray; contain leak-detection sensors.
Quick-disconnect fittings — allow hot-swap of compute trays without draining the loop.
Facility-level retrofits — chilled-water supply, leak containment, redundancy.

This is why Meta could not "just switch" for the H100 generation — the facility constraint had to be retired through capex-scale redesign, not firmware update.

Seen in¶

sources/2024-10-15-meta-metas-open-ai-hardware-vision — Catalina at 140 kW; Meta's break from air-cooled AI racks.
sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale — the explicit air-cooling-constraint framing that Catalina later relaxes.
sources/2025-08-08-dropbox-seventh-generation-server-hardware — complementary storage-rack datum at the opposite (~16 kW air) end of the power-density spectrum.

concepts/rack-level-power-density — the binding constraint liquid cooling lifts.
concepts/heat-management — thermal-engineering counterpart at the broader discipline level.
systems/catalina-rack — canonical wiki liquid-cooled AI rack.
systems/orv3-rack — the HPR chassis specification.
systems/grand-teton — the air-cooled predecessor.
systems/nvidia-gb200-grace-blackwell — the silicon forcing the transition.
companies/meta.