Hallucination
The tendency of an LLM to generate confident, fluent, plausible-sounding statements that are factually incorrect, unsupported, or fabricated — without any internal signal that the output is unreliable. The model does not know it doesn’t know.
Hallucination is the LLM analog of Overconfidence Bias, but with a structural difference: human overconfidence involves underestimating uncertainty; LLM hallucination involves having no uncertainty signal at all for many outputs.
Mechanism
LLMs are trained to predict the next token given a context. The training objective has no direct mechanism for:
- Distinguishing “I learned this from reliable sources” from “this token sequence looks plausible”
- Refusing to generate when evidence is thin
- Calibrating confidence to evidence quality
Result: the model generates fluent text regardless of reliability. Fluency and factual accuracy are decoupled. The model’s “confidence” is expressed through register (hedging words, assertive tone) — but this register is also learned from patterns, not from actual epistemic state.
Types of hallucination:
| Type | Description | Example |
|---|---|---|
| Factual fabrication | States false facts as true | Fake citations, wrong dates, invented statistics |
| Confabulation | Fills gaps in knowledge with plausible-sounding content | ”As Heuer wrote in his 1995 paper…” (paper doesn’t exist) |
| Over-generalization | Applies a pattern to cases where it doesn’t hold | Correct principle applied to wrong domain |
| Source hallucination | Invents or misattributes sources | Cites a real author but wrong paper, or invents a paper entirely |
| Reasoning hallucination | Produces correct-seeming intermediate reasoning steps that lead to wrong conclusions | Math errors with confident presentation |
Why It Matters for Agentic Systems
In a single-turn chat, a hallucinated fact can be caught by a skeptical user. In agentic systems, hallucination is more dangerous:
- Tool-use agents: a hallucinated API endpoint, SQL table, or file path causes a cascade of downstream errors
- Multi-agent pipelines: Agent B receives Agent A’s hallucinated output as grounded fact; it builds on it; the hallucination propagates and compounds
- Long-horizon tasks: early hallucinations shape later steps; by the time the error manifests, the causal chain is long and the fix is costly
- Self-consistency amplification: if an agent generates a false claim and then checks it by re-generating, the same false pattern may re-emerge (the model is consistent with itself, not with truth)
Relationship to Human Biases
| Hallucination Pattern | Human Analogue | Difference |
|---|---|---|
| High confidence on weak evidence | Overconfidence Bias | Human overconfidence: underestimates uncertainty. LLM: no uncertainty signal. |
| Fabricates supporting sources | Confirmation Bias | Human: selectively attends to confirming sources. LLM: invents them. |
| Plausible-sounding gap-filling | Availability Heuristic | Human: uses available patterns. LLM: uses fluency patterns from training. |
| Consistent-with-prior outputs | Anchoring Bias | Human: adjusts from anchor. LLM: maintains self-consistency. |
SAT Countermeasures
| Technique | How It Counters Hallucination |
|---|---|
| Quality of Information Check | Forces explicit source audit — agent must identify what it actually has access to vs. what it’s inferring |
| Key Assumptions Check | Requires distinguishing “known fact” from “assumed” from “inferred” — surfaces the epistemic gaps hallucination fills |
| Devil’s Advocacy | Adversarial review agent looks specifically for unsupported claims, missing citations, fabricated specifics |
| Indicators or Signposts of Change | Forces explicit statement of what evidence would be needed to support each claim — hallucinated claims typically can’t specify real evidence |
Prompt Patterns
Pattern 1 — Epistemic labeling:
"For each factual claim in your response, label it as one of:
[KNOWN] — you have high confidence from training data
[INFERRED] — logical deduction from known facts
[UNCERTAIN] — you are not confident; this may be wrong
[UNKNOWN] — you do not have this information
Do not generate [UNKNOWN] claims. Flag [UNCERTAIN] claims explicitly."
Pattern 2 — Source grounding:
"Only make factual claims that you can ground in the documents provided
in this context. If a claim is not supported by the provided context,
say so explicitly rather than drawing on training data."
Pattern 3 — Absence acknowledgment:
"Before answering, list: what information would you need to answer
this question reliably? Which of that do you actually have?
What are you filling in from pattern rather than evidence?"
Distinction: Hallucination vs. Sycophancy
Both are LLM-native failure modes, but the trigger differs:
| Sycophancy | Hallucination | |
|---|---|---|
| Trigger | Social/approval signal (user preference) | Fluency signal (plausible completion) |
| Target | Agreement with user’s views | Filling gaps with fluent content |
| Primary SAT counter | Devil’s Advocacy, Team A/Team B | Quality of Information Check, KAC |
| Mitigation | Adversarial prompting, role assignment | Source grounding, epistemic labeling |
In practice they interact: a hallucinated claim that the user implicitly wants to be true will be sycophantically reinforced rather than corrected.
Empirical Evidence
| Study | Finding |
|---|---|
| Huang et al. (2023) | Canonical survey. Key taxonomic split: factuality (claim wrong about world) vs. faithfulness (claim unsupported by provided source). RAG does not eliminate either. |
| Kadavath et al. (2022) | Large models can be queried for calibrated assessments of their own correctness (P(True)) — uncertainty information is present internally but not surfaced by default generation. |
Implication: hallucination is not pure ignorance — it is failure to surface uncertainty the model already represents internally. SAT-style epistemic labeling prompts have a plausible mechanism to work because they query for information that exists. See H7.
See Also
Overconfidence Bias | Sycophancy | Quality of Information Check | Key Assumptions Check | SATs for LLM Agents