CONCEPT Cited by 1 source
Dense Transformer¶
Definition¶
Dense Transformer is the frontier-LLM architectural shape in which the model is a single transformer stack — every token passes through every parameter at every layer — as opposed to a Mixture of Experts (MoE) which routes tokens to a sparse subset of experts.
Verbatim from Peter Corless (Redpanda, 2026-01-13):
"Anthropic Claude remains a single model, known as a Dense Transformer." (Source: sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls)
As of the post's 2026-01 publication, Claude is named as the notable exception to the industry-wide trend of frontier LLMs adopting MoE: GPT-4 (8 × 220B experts per the 2023 George Hotz leak), Google Gemini (MoE since 1.5), xAI Grok (MoE since Grok-1) are all MoE; Claude is the named dense holdout.
Dense vs MoE trade-off (frontier-LLM shape)¶
- Dense: every parameter active for every token. Simpler routing, stronger intra-batch contention, worse scaling past ~10^12 parameters without serving-compute growth.
- MoE: per-token top-k routing to a sparse expert subset. Total parameter count can grow without proportional per-token compute, at the cost of routing overhead, load imbalance risks, and more complex serving stacks.
Corless notes both shapes share the same batch-trained limitation ( frontier-model batch-training boundary): dense or MoE, all are offline-batch pre-trained.
Caveats¶
- Single-sourced on the wiki. Only the Corless 2026-01-13 post canonicalises the Dense vs MoE frontier-LLM split here. Anthropic has not publicly confirmed Claude's architectural shape; the claim is widely-reported industry understanding.
- "Dense Transformer" is a contrast term, not a formal category. The wiki uses it as the foil to MoE; in the ML literature the more common term is simply "transformer" with MoE as the variant.
- Stub. This page is a minimal definition for cross-linking; deeper dense-vs-sparse routing comparison is deferred.
Seen in¶
- 2026-01-13 Redpanda — The convergence of AI and data streaming, Part 1 (sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls) — canonical: Claude named as the dense-transformer holdout among frontier LLMs.
Related¶
- concepts/mixture-of-experts — the foil shape.
- systems/transformer — the architecture primitive.
- concepts/frontier-model-batch-training-boundary — the structural limitation both shapes share.
- companies/redpanda — the company whose blog canonicalises this framing.