Hallucination

The tendency of an LLM to generate confident, fluent, plausible-sounding statements that are factually incorrect, unsupported, or fabricated — without any internal signal that the output is unreliable. The model does not know it doesn’t know.

Hallucination is the LLM analog of Overconfidence Bias, but with a structural difference: human overconfidence involves underestimating uncertainty; LLM hallucination involves having no uncertainty signal at all for many outputs.

Mechanism

LLMs are trained to predict the next token given a context. The training objective has no direct mechanism for:

Distinguishing “I learned this from reliable sources” from “this token sequence looks plausible”
Refusing to generate when evidence is thin
Calibrating confidence to evidence quality

Result: the model generates fluent text regardless of reliability. Fluency and factual accuracy are decoupled. The model’s “confidence” is expressed through register (hedging words, assertive tone) — but this register is also learned from patterns, not from actual epistemic state.

Types of hallucination:

Type	Description	Example
Factual fabrication	States false facts as true	Fake citations, wrong dates, invented statistics
Confabulation	Fills gaps in knowledge with plausible-sounding content	”As Heuer wrote in his 1995 paper…” (paper doesn’t exist)
Over-generalization	Applies a pattern to cases where it doesn’t hold	Correct principle applied to wrong domain
Source hallucination	Invents or misattributes sources	Cites a real author but wrong paper, or invents a paper entirely
Reasoning hallucination	Produces correct-seeming intermediate reasoning steps that lead to wrong conclusions	Math errors with confident presentation

Why It Matters for Agentic Systems

In a single-turn chat, a hallucinated fact can be caught by a skeptical user. In agentic systems, hallucination is more dangerous:

Tool-use agents: a hallucinated API endpoint, SQL table, or file path causes a cascade of downstream errors
Multi-agent pipelines: Agent B receives Agent A’s hallucinated output as grounded fact; it builds on it; the hallucination propagates and compounds
Long-horizon tasks: early hallucinations shape later steps; by the time the error manifests, the causal chain is long and the fix is costly
Self-consistency amplification: if an agent generates a false claim and then checks it by re-generating, the same false pattern may re-emerge (the model is consistent with itself, not with truth)

Relationship to Human Biases

Hallucination Pattern	Human Analogue	Difference
High confidence on weak evidence	Overconfidence Bias	Human overconfidence: underestimates uncertainty. LLM: no uncertainty signal.
Fabricates supporting sources	Confirmation Bias	Human: selectively attends to confirming sources. LLM: invents them.
Plausible-sounding gap-filling	Availability Heuristic	Human: uses available patterns. LLM: uses fluency patterns from training.
Consistent-with-prior outputs	Anchoring Bias	Human: adjusts from anchor. LLM: maintains self-consistency.

SAT Countermeasures

Technique	How It Counters Hallucination
Quality of Information Check	Forces explicit source audit — agent must identify what it actually has access to vs. what it’s inferring
Key Assumptions Check	Requires distinguishing “known fact” from “assumed” from “inferred” — surfaces the epistemic gaps hallucination fills
Devil’s Advocacy	Adversarial review agent looks specifically for unsupported claims, missing citations, fabricated specifics
Indicators or Signposts of Change	Forces explicit statement of what evidence would be needed to support each claim — hallucinated claims typically can’t specify real evidence

Prompt Patterns

Pattern 1 — Epistemic labeling:

"For each factual claim in your response, label it as one of:
  [KNOWN] — you have high confidence from training data
  [INFERRED] — logical deduction from known facts
  [UNCERTAIN] — you are not confident; this may be wrong
  [UNKNOWN] — you do not have this information

Do not generate [UNKNOWN] claims. Flag [UNCERTAIN] claims explicitly."

Pattern 2 — Source grounding:

"Only make factual claims that you can ground in the documents provided
in this context. If a claim is not supported by the provided context,
say so explicitly rather than drawing on training data."

Pattern 3 — Absence acknowledgment:

"Before answering, list: what information would you need to answer
this question reliably? Which of that do you actually have?
What are you filling in from pattern rather than evidence?"

Distinction: Hallucination vs. Sycophancy

Both are LLM-native failure modes, but the trigger differs:

	Sycophancy	Hallucination
Trigger	Social/approval signal (user preference)	Fluency signal (plausible completion)
Target	Agreement with user’s views	Filling gaps with fluent content
Primary SAT counter	Devil’s Advocacy, Team A/Team B	Quality of Information Check, KAC
Mitigation	Adversarial prompting, role assignment	Source grounding, epistemic labeling

In practice they interact: a hallucinated claim that the user implicitly wants to be true will be sycophantically reinforced rather than corrected.

Empirical Evidence

Study	Finding
Huang et al. (2023)	Canonical survey. Key taxonomic split: factuality (claim wrong about world) vs. faithfulness (claim unsupported by provided source). RAG does not eliminate either.
Kadavath et al. (2022)	Large models can be queried for calibrated assessments of their own correctness (P(True)) — uncertainty information is present internally but not surfaced by default generation.

Implication: hallucination is not pure ignorance — it is failure to surface uncertainty the model already represents internally. SAT-style epistemic labeling prompts have a plausible mechanism to work because they query for information that exists. See H7.

hallucination

Hallucination

Mechanism

Why It Matters for Agentic Systems

Relationship to Human Biases

SAT Countermeasures

Prompt Patterns

Distinction: Hallucination vs. Sycophancy

Empirical Evidence

See Also

Graph View

Table of Contents

Backlinks