CONCEPT Cited by 1 source

Dense Transformer¶

Definition¶

Dense Transformer is the frontier-LLM architectural shape in which the model is a single transformer stack — every token passes through every parameter at every layer — as opposed to a Mixture of Experts (MoE) which routes tokens to a sparse subset of experts.

Verbatim from Peter Corless (Redpanda, 2026-01-13):

"Anthropic Claude remains a single model, known as a Dense Transformer." (Source: sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls)

As of the post's 2026-01 publication, Claude is named as the notable exception to the industry-wide trend of frontier LLMs adopting MoE: GPT-4 (8 × 220B experts per the 2023 George Hotz leak), Google Gemini (MoE since 1.5), xAI Grok (MoE since Grok-1) are all MoE; Claude is the named dense holdout.

Dense vs MoE trade-off (frontier-LLM shape)¶

Dense: every parameter active for every token. Simpler routing, stronger intra-batch contention, worse scaling past ~10^12 parameters without serving-compute growth.
MoE: per-token top-k routing to a sparse expert subset. Total parameter count can grow without proportional per-token compute, at the cost of routing overhead, load imbalance risks, and more complex serving stacks.

Corless notes both shapes share the same batch-trained limitation ( frontier-model batch-training boundary): dense or MoE, all are offline-batch pre-trained.

Caveats¶

Single-sourced on the wiki. Only the Corless 2026-01-13 post canonicalises the Dense vs MoE frontier-LLM split here. Anthropic has not publicly confirmed Claude's architectural shape; the claim is widely-reported industry understanding.
"Dense Transformer" is a contrast term, not a formal category. The wiki uses it as the foil to MoE; in the ML literature the more common term is simply "transformer" with MoE as the variant.
Stub. This page is a minimal definition for cross-linking; deeper dense-vs-sparse routing comparison is deferred.

Seen in¶

2026-01-13 Redpanda — The convergence of AI and data streaming, Part 1 (sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls) — canonical: Claude named as the dense-transformer holdout among frontier LLMs.

concepts/mixture-of-experts — the foil shape.
systems/transformer — the architecture primitive.
concepts/frontier-model-batch-training-boundary — the structural limitation both shapes share.
companies/redpanda — the company whose blog canonicalises this framing.

Dense Transformer¶

Definition¶

Dense vs MoE trade-off (frontier-LLM shape)¶

Caveats¶

Seen in¶

Related¶