CONCEPT Cited by 3 sources
Bisection bandwidth¶
Definition¶
Bisection bandwidth is the aggregate bandwidth available across the narrowest cut that divides a network in half. For a non-blocking Clos fabric, it equals the sum of the downlink bandwidths of all the hosts on one side — i.e. the fabric is "full-bisection" when any half of the hosts can simultaneously communicate with the other half at their full link speed.
For AI training clusters, bisection bandwidth is the classical measure of whether the fabric can support arbitrary all-to-all communication patterns without becoming the bottleneck — which is exactly what 3D-parallelism collectives (AllReduce, AllGather, ReduceScatter) demand.
Meta's framing (2024)¶
The concept recurs across three Meta posts:
- 2024-06-12: Meta's two 24K-GPU clusters were differentiated by fabric choice with bisection bandwidth as one axis — the InfiniBand cluster was "optimized for full-bisection bandwidth" while the RoCE cluster was optimized for build speed. After tuning, both achieved equivalent AllReduce performance. (Source: sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale)
- 2024-08-05: The SIGCOMM 2024 RoCE paper documents the routing evolution that made the RoCE fabric bisection-bandwidth-competitive with the InfiniBand sibling — ECMP → path pinning → E-ECMP + QP scaling. (Source: sources/2024-08-05-meta-a-roce-network-for-distributed-ai-training-at-scale)
- 2024-10-15: Meta projects forward — "equal normalized bisection bandwidth" to the ~1 TB/s-per-accelerator injection bandwidth target. (Source: sources/2024-10-15-meta-metas-open-ai-hardware-vision)
Why it matters¶
- Defines whether collectives scale. All-reduce throughput at cluster scale is bounded by bisection bandwidth divided by the number of participants; under-provisioned bisection = under-utilised GPUs.
- Non-blocking Clos is the stock answer, but expensive. Full-bisection at AI-cluster scale means a lot of switch silicon and a lot of cabling; Meta's AI Zone template instead provides non-blocking within the Zone and accepts oversubscription across Zones, compensated by topology-aware scheduling.
- Pairs with injection bandwidth in lockstep. A fabric with high per-accelerator injection bandwidth but oversubscribed bisection chokes at collective time; Meta's 2024-10 forward-looking design targets both in lockstep.
Seen in¶
- sources/2024-10-15-meta-metas-open-ai-hardware-vision — forward-looking ~1 TB/s-per-accelerator target with "equal normalized" bisection.
- sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale — InfiniBand cluster optimised for full-bisection bandwidth, RoCE for build speed.
- sources/2024-08-05-meta-a-roce-network-for-distributed-ai-training-at-scale — SIGCOMM 2024 deep-dive on the routing evolution that unlocks bisection bandwidth on the RoCE fabric.
Related¶
- concepts/injection-bandwidth-ai-cluster — per-accelerator counterpart.
- concepts/fat-flow-load-balancing / concepts/collective-communication-topology-awareness — the techniques that determine whether bisection bandwidth is usable for collectives.
- systems/meta-genai-cluster-infiniband / systems/meta-genai-cluster-roce — canonical 24K-GPU instances.
- systems/meta-dsf-disaggregated-scheduled-fabric — the next-gen fabric targeting equal injection-plus-bisection scaling.
- systems/ai-zone — Meta's Clos topology template with non-blocking-inside-Zone / oversubscribed-across-Zones policy.
- companies/meta.