Analysis of Competing Hypotheses (ACH)

A SAT that identifies all alternative explanations (hypotheses) and evaluates evidence to disconfirm rather than confirm them. The most rigorous diagnostic technique.

Purpose

Overcome three common analytic mistakes:

Undue influence of a first impression based on incomplete data or existing analytic line
Failure to generate a full set of alternative hypotheses at project outset
Relying on evidence that supports the preferred hypothesis but is also consistent with other explanations

When to Use

When there is a large amount of data to absorb and evaluate
Most effective with a small team (members challenge each other’s evidence evaluation)
Controversial issues requiring a clear audit trail
When considering the possibility of denial and deception
Initial matrix can be built in a day or less; data reassembly may require more time

Method (8 Steps)

Brainstorm all possible hypotheses with analysts of different perspectives
List all significant evidence and arguments relevant to all hypotheses
Build a matrix — hypotheses across the top, evidence down the side; rate each: C (consistent), I (inconsistent), N (not applicable/neutral)
Refine — reconsider hypotheses; add new ones; re-examine information
Focus on disproving — tally inconsistency scores; weakest explanations eliminated first
Sensitivity analysis — if a few critical pieces of evidence proved wrong or deceptive, how would that change results?
Ask what’s missing — what evidence would be expected if a given hypothesis were true? Is denial/deception possible?
Report all conclusions — including weaker hypotheses that should still be monitored

Diagnostic value of evidence: Evidence consistent with only one hypothesis is most valuable. Evidence consistent with all hypotheses has low diagnostic value.

Example: Tokyo Sarin Attack (March 1995)

Four hypotheses: kooky cult (H1), terrorist group (H2), political movement (H3), criminal group (H4)

Evidence	Weight	H1: Cult	H2: Terror	H3: Political	H4: Criminal
Attacks on journalists	MEDIUM	I	N	I	I
Religious affiliation	MEDIUM	C	I	I	I
Established party	MEDIUM	N	N	C	I
Blind leader Matsumoto	MEDIUM	C	C	C	C
Inconsistency Score		-1.0	-1.0	-2.0	-3.0

Result: criminal group hypothesis most strongly disconfirmed; terrorist group hypothesis most consistent.

Biases Primarily Controlled

Bias	How this technique counters it
Confirmation Bias	Disconfirmation focus is the structural core — you must disprove hypotheses, you cannot simply confirm a preferred one
Anchoring Bias	Building all hypotheses simultaneously before evaluating evidence prevents any single hypothesis from becoming an anchor
Availability Heuristic	Requires listing all hypotheses, including ones that don’t readily come to mind; the matrix forces equal treatment
Motivated Reasoning	The matrix structure makes it structurally difficult to reach a preferred conclusion through selective evidence weighting
Groupthink	Most effective with a small team; team members challenge each other’s evidence ratings across all hypotheses

Applied in Cybersecurity

Threat Intelligence: ensures all potential threat actors, motivations, and capabilities are rigorously evaluated (Riley: SATs in Cybersecurity (2024))
Forensic Investigators: considers all explanations for evidence, avoiding Confirmation bias
Risk Analysts: evaluates likelihood of diverse threat scenarios

LLM Implementation (per Roberts: LLM SATs FTW (2025))

Scott Roberts found that ACH requires multi-step sequential LLM queries — a single zero-shot prompt fails ACH because the task is fundamentally evaluative and depends on hypotheses being generated before evidence can be evaluated. Roberts’ working implementation:

Query 1: LLM generates competing hypotheses for the analytic question
Query 2–N: For each hypothesis, a separate query generates evidence for and against
Query 2–N+M: For each hypothesis-evidence pair, a separate query scores it on -5 to +5 scale
Post-processing: Scores totalled per hypothesis; results exported as CSV
Human step: Analyst reviews CSV, adds missing evidence, adjusts scores, makes final call

“ACH is a complex SAT that takes teams hours or even days to complete… many teams of analysts struggle with it.” — Roberts

Architecture: Streamlit + GPT-4 + LangChain + Pydantic (structured output)
Live app: https://sat-ach.streamlit.app/ | Code: https://github.com/sroberts/talk-llm-sats-ftw-code/blob/main/experiment-2-ach.py

⚠ Warning: Any single-prompt ACH implementation is a known failure mode — hypothesis generation primes the model before evidence evaluation, defeating the disconfirmation structure.

ACH as RAG Grounding (per suprathermal — ACH-Grounding)

A second independent open-source implementation (suprathermal, https://github.com/suprathermal/ACH-Grounding) converges on the same multi-step pattern but adds a sharper architectural claim: the LLM should only fill matrix cells; synthesis must be done by classical deterministic code.

Key design points:

Cell-level LLM calls — each (evidence, hypothesis) pair is scored in its own focused judgment, where the model is strong
Classical algorithms compute likelihood over the matrix, where LLMs are unreliable due to provable hallucination bounds on combinatorially complex problems (cites arXiv:2401.11817, arXiv:2508.01781)
ExtraH / ExtraE config — verified hypotheses and evidence can be injected as grounding constraints
Iterative RAG — previously generated hypotheses and evidence are fed back as context for subsequent calls
Cost model: O((|Evidence| + |Hypotheses|)²) tokens — practical only for “few hypotheses, few precious pieces of evidence, everything highly uncertain” cases

Convergent finding: Two independent implementations (Roberts and suprathermal) arrived at the same architectural pattern — multi-step LLM calls + externalized synthesis. The convergence is empirical evidence for the principle that ACH-as-LLM-prompt fails and ACH-as-pipeline works.

This positions ACH not just as a debiasing technique but as a general LLM grounding mechanism — a structural pattern for forcing exhaustive cross-referencing of every evidence against every hypothesis.

analysis-of-competing-hypotheses

Analysis of Competing Hypotheses (ACH)

Purpose

When to Use

Method (8 Steps)

Example: Tokyo Sarin Attack (March 1995)

Biases Primarily Controlled

Applied in Cybersecurity

LLM Implementation (per Roberts: LLM SATs FTW (2025))

ACH as RAG Grounding (per suprathermal — ACH-Grounding)

Sources

Graph View

Table of Contents

Backlinks