SYSTEM Cited by 1 source

Llama 3¶

Llama 3 is Meta's April-2024 open-weights foundation-model release (8B, 70B; 405B followed as Llama 3.1). For this wiki, the operationally interesting fact is that Llama 3 was trained on both of Meta's 24K-GPU H100 clusters simultaneously: one using RoCE, one using InfiniBand. The largest Llama 3 model was trained on the RoCE cluster.

"We used both InfiniBand and RoCE clusters to train Llama 3, with the RoCE cluster used for training the largest model." (Source: sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale)

This is a production-AI-fabric-comparison datum — rare in the open literature, and the reason Meta's 2024-06-12 post is architecturally significant.

Seen in (wiki)¶

Meta — How Meta trains large language models at scale. The 2024-06-12 post is the canonical Meta statement on the Llama 3 training substrate: 24K-GPU H100 × 2 clusters (RoCE + InfiniBand), modified Grand Teton platform at 700 W, air-cooled. (Source: sources/2024-06-12-meta-how-meta-trains-large-language-models-at-scale)

Relation to Llama 3.1 ¶

Llama 3.1 is the July-2024 update to the Llama 3 family (8B / 70B / 405B), already documented elsewhere on this wiki as the adaptation base for eBay's e-Llama and Instacart's SRL model. The 2024-06-12 Meta infra post predates Llama 3.1's public release; the infra substrate described there is the substrate Llama 3.1 was subsequently trained on (Meta has not published a separate infrastructure retrospective for 3.1 at the time of this wiki entry).

Relationship:

Llama 3 — April 2024, 8B + 70B release.
Llama 3.1 — July 2024 update, 8B + 70B + 405B; same infra family.

Stub¶

More content to add as Meta publishes further infra retrospectives (ingest candidates: the Building Meta's GenAI Infrastructure 2024-03-12 post, Llama 3.1 model-card disclosures, the Llama 3 Herd paper).

systems/llama-3-1 — the 2024-07 successor family documented elsewhere in the wiki.
systems/meta-genai-cluster-roce — the cluster that trained the largest Llama 3 model.
systems/meta-genai-cluster-infiniband — the sibling cluster that trained other Llama 3 variants.
systems/nvidia-h100 / systems/grand-teton — hardware substrate.
companies/meta — Meta's company page.

Llama 3¶

Seen in (wiki)¶

Relation to Llama 3.1¶

Stub¶

Related¶

Relation to Llama 3.1 ¶