Hallucination

The tendency of an LLM to generate confident, fluent, plausible-sounding statements that are factually incorrect, unsupported, or fabricated — without any internal signal that the output is unreliable. The model does not know it doesn’t know.

Hallucination is the LLM analog of Overconfidence Bias, but with a structural difference: human overconfidence involves underestimating uncertainty; LLM hallucination involves having no uncertainty signal at all for many outputs.


Mechanism

LLMs are trained to predict the next token given a context. The training objective has no direct mechanism for:

  1. Distinguishing “I learned this from reliable sources” from “this token sequence looks plausible”
  2. Refusing to generate when evidence is thin
  3. Calibrating confidence to evidence quality

Result: the model generates fluent text regardless of reliability. Fluency and factual accuracy are decoupled. The model’s “confidence” is expressed through register (hedging words, assertive tone) — but this register is also learned from patterns, not from actual epistemic state.

Types of hallucination:

TypeDescriptionExample
Factual fabricationStates false facts as trueFake citations, wrong dates, invented statistics
ConfabulationFills gaps in knowledge with plausible-sounding content”As Heuer wrote in his 1995 paper…” (paper doesn’t exist)
Over-generalizationApplies a pattern to cases where it doesn’t holdCorrect principle applied to wrong domain
Source hallucinationInvents or misattributes sourcesCites a real author but wrong paper, or invents a paper entirely
Reasoning hallucinationProduces correct-seeming intermediate reasoning steps that lead to wrong conclusionsMath errors with confident presentation

Why It Matters for Agentic Systems

In a single-turn chat, a hallucinated fact can be caught by a skeptical user. In agentic systems, hallucination is more dangerous:

  • Tool-use agents: a hallucinated API endpoint, SQL table, or file path causes a cascade of downstream errors
  • Multi-agent pipelines: Agent B receives Agent A’s hallucinated output as grounded fact; it builds on it; the hallucination propagates and compounds
  • Long-horizon tasks: early hallucinations shape later steps; by the time the error manifests, the causal chain is long and the fix is costly
  • Self-consistency amplification: if an agent generates a false claim and then checks it by re-generating, the same false pattern may re-emerge (the model is consistent with itself, not with truth)

Relationship to Human Biases

Hallucination PatternHuman AnalogueDifference
High confidence on weak evidenceOverconfidence BiasHuman overconfidence: underestimates uncertainty. LLM: no uncertainty signal.
Fabricates supporting sourcesConfirmation BiasHuman: selectively attends to confirming sources. LLM: invents them.
Plausible-sounding gap-fillingAvailability HeuristicHuman: uses available patterns. LLM: uses fluency patterns from training.
Consistent-with-prior outputsAnchoring BiasHuman: adjusts from anchor. LLM: maintains self-consistency.

SAT Countermeasures

TechniqueHow It Counters Hallucination
Quality of Information CheckForces explicit source audit — agent must identify what it actually has access to vs. what it’s inferring
Key Assumptions CheckRequires distinguishing “known fact” from “assumed” from “inferred” — surfaces the epistemic gaps hallucination fills
Devil’s AdvocacyAdversarial review agent looks specifically for unsupported claims, missing citations, fabricated specifics
Indicators or Signposts of ChangeForces explicit statement of what evidence would be needed to support each claim — hallucinated claims typically can’t specify real evidence

Prompt Patterns

Pattern 1 — Epistemic labeling:

"For each factual claim in your response, label it as one of:
  [KNOWN] — you have high confidence from training data
  [INFERRED] — logical deduction from known facts
  [UNCERTAIN] — you are not confident; this may be wrong
  [UNKNOWN] — you do not have this information

Do not generate [UNKNOWN] claims. Flag [UNCERTAIN] claims explicitly."

Pattern 2 — Source grounding:

"Only make factual claims that you can ground in the documents provided
in this context. If a claim is not supported by the provided context,
say so explicitly rather than drawing on training data."

Pattern 3 — Absence acknowledgment:

"Before answering, list: what information would you need to answer
this question reliably? Which of that do you actually have?
What are you filling in from pattern rather than evidence?"

Distinction: Hallucination vs. Sycophancy

Both are LLM-native failure modes, but the trigger differs:

SycophancyHallucination
TriggerSocial/approval signal (user preference)Fluency signal (plausible completion)
TargetAgreement with user’s viewsFilling gaps with fluent content
Primary SAT counterDevil’s Advocacy, Team A/Team BQuality of Information Check, KAC
MitigationAdversarial prompting, role assignmentSource grounding, epistemic labeling

In practice they interact: a hallucinated claim that the user implicitly wants to be true will be sycophantically reinforced rather than corrected.


Empirical Evidence

StudyFinding
Huang et al. (2023)Canonical survey. Key taxonomic split: factuality (claim wrong about world) vs. faithfulness (claim unsupported by provided source). RAG does not eliminate either.
Kadavath et al. (2022)Large models can be queried for calibrated assessments of their own correctness (P(True)) — uncertainty information is present internally but not surfaced by default generation.

Implication: hallucination is not pure ignorance — it is failure to surface uncertainty the model already represents internally. SAT-style epistemic labeling prompts have a plausible mechanism to work because they query for information that exists. See H7.


See Also

Overconfidence Bias | Sycophancy | Quality of Information Check | Key Assumptions Check | SATs for LLM Agents