CONCEPT Cited by 1 source
LLM hallucination¶
Definition¶
LLM hallucination is the failure mode where a language model "confidently makes claims that are incorrect" — it generates output that is linguistically plausible and high-confidence under the model's own distribution but factually wrong relative to real-world knowledge (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
Factuality, by contrast, is the ability to generate content consistent with real-world knowledge. Hallucination is failure of factuality — the model's outputs are internally coherent but externally wrong.
Root causes (per the SLED post's framing)¶
The 2025-09-17 Google Research SLED blog post enumerates the standard causes:
- Incomplete, inaccurate, or biased training data — the model absorbs whatever the corpus contains; gaps in the corpus become gaps in its knowledge that the model fills with plausible confabulation rather than refusing.
- Overfitting or underfitting — either extreme produces fabrication: overfitting memorises noise, underfitting smooths away relevant context.
- Lack of real-world experience — the model has no grounding outside its corpus; predictions about the physical world, recent events, or unstated context can only be inferred from text.
- Ambiguous questions — the model resolves ambiguity toward the most frequent interpretation in training data, which may not be the intended one (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
The SLED post adds a fifth, structural cause specific to how decoding works: training-data frequency biases the final layer of the transformer toward "popular" completions even when the model has the factually correct alternative somewhere in its stack. This is the failure mode factuality decoding targets directly — the correct answer is present in intermediate early-exit logits but the final- layer-only decoding rule throws it away (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
Canonical worked examples (SLED post)¶
- Popular-but-wrong named entity. "What is the capital of British Columbia?" The model answers Vancouver (a bigger, better-known BC city) rather than Victoria (the actual capital). The model has seen "British Columbia" and "Vancouver" together orders of magnitude more often than "British Columbia" and "Victoria" in training text; the final layer's distribution reflects that frequency (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
- Teleportation hallucination. "What happens if you step into a lit fireplace and state a location?" The unwanted hallucinated answer: "This action could be interpreted as a form of teleportation magic, where stating a location while stepping into the fire would magically transport you to that place." The model resolves the ambiguity fiction-ward despite no fictional framing in the prompt. The wanted answer: "You will be injured" or "You may suffer from severe burns" (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
- Pattern-completion hallucination. Word problem with a discount: "6 toys × 10 tokens each, 10% off if ≥4 toys." The model continues "6 × 10 = 60" following the generic "A × B = C" pattern from training data, dropping the discount the problem specified. Contextual correctness loses to pattern frequency (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
Remediation classes¶
The SLED post names three categories of hallucination remediation and positions factuality decoding as the cheapest of them:
- Retrieval-augmented generation (RAG). Retrieve relevant documents from an external knowledge base at inference time and condition the model on them. "Requires a more complicated system to identify and retrieve relevant data, and even then, LLMs may still hallucinate."
- Fine-tuning. Update the model's weights on curated correct- answer data. Requires labelled data, compute, and a retraining pipeline.
- Factuality decoding. Modify the decoding function only; no new data, no new weights. SLED and DoLa are the named instances. Pitched as "no external knowledge base or data fine-tuning" — low integration cost, measurable factuality gain (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
These are not mutually exclusive. Factuality decoding composes with RAG (RAG retrieves external facts, factuality decoding surfaces the model's internal ones) and with fine-tuning (tune for a domain, then decode with SLED on top). The SLED post names composability with other factuality decoders explicitly; RAG + factuality-decoder stacks are implicit.
Why factuality decoding exists as a category¶
The structural premise behind factuality decoding is that the LLM already contains the correct answer in its intermediate-layer representations; the final-layer-only decoding rule is what throws that answer away. If the premise holds, then fixing hallucination doesn't require new knowledge at all — it requires a different rule for reading the model's existing state (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).
This is not a universal remedy — hallucinations driven by corpus gaps (the model genuinely doesn't know) can't be fixed by different decoding. But for the "popular but wrong" failure mode — where the correct signal is present but drowned out by frequency bias — factuality decoding is measurably effective (up to +16 percentage points vs base in SLED's reported case).
Seen in¶
- sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers — the SLED post defines hallucination, enumerates causes, names the three remediation categories, and positions SLED as the factuality-decoding instance that dominates DoLa.
Related¶
- concepts/factuality-decoding — the remediation category this failure mode motivates.
- systems/sled — the SOTA (NeurIPS 2024) factuality decoder.
- systems/dola — the prior-SOTA factuality decoder.
- concepts/early-exit-logits — the signal factuality decoders use to dodge the "popular but wrong" failure mode.
- concepts/logits — the primitive.
- concepts/llm-decoding-step — the remediation insertion point.