SYSTEM Cited by 1 source
NVLink¶
NVLink is NVIDIA's high-bandwidth, low-latency intra-node GPU-to-GPU interconnect. It is the communication substrate that makes tensor parallelism viable at per-layer all-reduce cadence — at hundreds of GB/s per link per direction, an all-reduce of layer activations is small enough to hide under compute. PCIe (tens of GB/s) cannot. NVSwitch extends NVLink into a full GPU-to-GPU crossbar within a chassis.
Contrast with InfiniBand, which is the inter-node fabric: NVLink runs within a single node; InfiniBand runs between nodes. Together they form the substrate for concepts/3d-parallelism|3D parallelism across a multi-node training cluster.
Seen in (wiki)¶
- eBay e-Llama training cluster. NVLink is explicitly named as the intra-node interconnect (with InfiniBand for inter-node) on the 60-node × 8-GPU × H100 80GB = 480-GPU fleet used for continued pretraining of Llama 3.1 8B + 70B. (Source: sources/2025-01-17-ebay-scaling-large-language-models-for-e-commerce-the-development)
Why it matters¶
- Tensor parallelism lives inside the NVLink domain in practice. Per-layer all-reduce / all-gather at model-training batch sizes is bandwidth-heavy; spanning across nodes (InfiniBand) at the same cadence is usually not viable.
- Pipeline parallelism tolerates the slower inter-node hop — point-to-point activations only between stage boundaries, lower frequency of communication.
- Practical shape of a 3D-parallel training recipe: TP within the NVLink domain (e.g. 8-way TP across the 8 GPUs in a node), PP across the InfiniBand fabric (e.g. across groups of nodes), DP filling the remaining degree to saturate the fleet.
Stub — expand with specific NVLink generations, bandwidth numbers, NVSwitch topologies, and NVL72/NVL36 rack-scale configurations as more sources cite them.
Related¶
- systems/nvidia-h100 — the GPU that hosts the NVLink endpoints in the eBay cluster.
- systems/infiniband — the inter-node fabric paired with NVLink.
- systems/megatron-lm — the training framework whose 3D parallelism exploits the NVLink + InfiniBand split.
- concepts/tensor-parallelism — the parallelism axis that most depends on NVLink bandwidth.
- concepts/3d-parallelism / concepts/multi-gpu-serving