Skip to content

CONCEPT Cited by 1 source

Injection bandwidth (AI cluster)

Definition

Injection bandwidth (in the context of an AI training cluster) is the per-accelerator network bandwidth at which a single GPU/accelerator can inject data into the fabric. It sets the upper bound on the pace at which gradients, activations, KV-cache slices, or tensor shards can be moved off an accelerator during collective communication — and therefore sets the ceiling on how much parallelism the fabric can support without becoming the bottleneck.

Meta's 2024-10 projection

Meta names a target regime for next-generation AI clusters:

"In the next few years, we anticipate greater injection bandwidth on the order of a terabyte per second, per accelerator, with equal normalized bisection bandwidth. This represents a growth of more than an order of magnitude compared to today's networks!" (Source: sources/2024-10-15-meta-metas-open-ai-hardware-vision)

Two orthogonal claims are stacked:

  • Absolute target: ~1 TB/s per accelerator injection bandwidth.
  • Shape target: bisection bandwidth should scale in lockstep — the fabric must be non-oversubscribed at the cluster level, not just at the pod level.

Why it matters

  • Bounds the communication fraction of training. 3D-parallelism communication is latency-hidden behind compute only up to the point where injection bandwidth saturates; past that point, communication dominates wall-clock time.
  • Dictates fabric generation. 400G-per-GPU today, 800G mid-term, 1.6T + co-packaged-optics for the TB/s-class regime Meta projects.
  • Drives silicon-level investment. Meta's FBNIC is the in-house-NIC-ASIC response to hitting injection-bandwidth limits with off-the-shelf NICs.
  • Fat-flow problems get worse. As per-accelerator injection bandwidth grows, per-flow throughput grows; fat-flow load balancing and topology-aware collectives become more, not less, critical.

Seen in

Last updated · 319 distilled / 1,201 read