Skip to content

CONCEPT Cited by 1 source

Dense Transformer

Definition

Dense Transformer is the frontier-LLM architectural shape in which the model is a single transformer stack — every token passes through every parameter at every layer — as opposed to a Mixture of Experts (MoE) which routes tokens to a sparse subset of experts.

Verbatim from Peter Corless (Redpanda, 2026-01-13):

"Anthropic Claude remains a single model, known as a Dense Transformer." (Source: sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls)

As of the post's 2026-01 publication, Claude is named as the notable exception to the industry-wide trend of frontier LLMs adopting MoE: GPT-4 (8 × 220B experts per the 2023 George Hotz leak), Google Gemini (MoE since 1.5), xAI Grok (MoE since Grok-1) are all MoE; Claude is the named dense holdout.

Dense vs MoE trade-off (frontier-LLM shape)

  • Dense: every parameter active for every token. Simpler routing, stronger intra-batch contention, worse scaling past ~10^12 parameters without serving-compute growth.
  • MoE: per-token top-k routing to a sparse expert subset. Total parameter count can grow without proportional per-token compute, at the cost of routing overhead, load imbalance risks, and more complex serving stacks.

Corless notes both shapes share the same batch-trained limitation ( frontier-model batch-training boundary): dense or MoE, all are offline-batch pre-trained.

Caveats

  • Single-sourced on the wiki. Only the Corless 2026-01-13 post canonicalises the Dense vs MoE frontier-LLM split here. Anthropic has not publicly confirmed Claude's architectural shape; the claim is widely-reported industry understanding.
  • "Dense Transformer" is a contrast term, not a formal category. The wiki uses it as the foil to MoE; in the ML literature the more common term is simply "transformer" with MoE as the variant.
  • Stub. This page is a minimal definition for cross-linking; deeper dense-vs-sparse routing comparison is deferred.

Seen in

Last updated · 470 distilled / 1,213 read