SAT Pipeline

Query: How do SATs compose into a complete analytic pipeline rather than isolated interventions?

Individual technique pages describe each SAT in isolation. This page describes how they chain together into end-to-end workflows for LLM agentic systems.

The Core Insight

SATs were designed as a toolkit — any one technique reduces bias in a specific way, but full analytic rigor requires multiple techniques applied in sequence. The sequence matters: each technique produces output that is the input for the next.

For LLM agents, this means the “SAT pipeline” is a multi-step orchestration pattern, not a single prompt. Scott Roberts’ ACH implementation demonstrated this: ACH cannot be a single prompt — it requires sequential calls where hypothesis generation, evidence generation, and scoring are separate queries. The same principle extends to full pipelines.

The Full Pipeline (Scoping → Analysis → Challenge → Forecast)

Phase 1: SCOPING
  Starbursting    → What questions do we need to answer?
       ↓
  Brainstorming   → What hypotheses / explanations exist?

Phase 2: ANALYSIS
  ACH             → Which hypotheses survive evidence disconfirmation?
       ↓
  Key Assumptions → What does the leading hypothesis depend on being true?
  Check
       ↓
  Quality of Info → Are our sources good enough to support this?
  Check

Phase 3: CHALLENGE
  Devil's Advocacy → What's the strongest case against the conclusion?
       ↓
  Red Team         → What would an adversary do with our conclusion?
  (if adversarial)

Phase 4: FORECAST / MONITOR
  Alternative      → What futures could emerge from this situation?
  Futures
       ↓
  Indicators /     → What signals will tell us which future is unfolding?
  Signposts

Minimal Viable Pipeline (3-step)

For most agentic tasks, a full 8-step pipeline is overkill. The minimal high-value sequence:

1. Key Assumptions Check   → What assumptions is this reasoning built on?
2. Devil's Advocacy        → What's the strongest case against the conclusion?
3. Quality of Info Check   → Do our sources actually support this?

This 3-step sequence covers the broadest set of biases (Confirmation Bias, Anchoring Bias, Motivated Reasoning, Overconfidence Bias, Sycophancy, Hallucination) with minimal orchestration complexity.

LLM Orchestration Architecture

Pattern A: Sequential Single-Agent

One agent processes all stages sequentially. Each SAT stage is a separate call; the agent receives the prior stage’s output as context.

Call 1 [Starbursting]: "What questions should we ask about [TOPIC]?"
  → output: question map

Call 2 [Brainstorming]: "Given these questions, what are the competing
  hypotheses? Generate all possibilities before evaluating any."
  → output: hypothesis list

Call 3–N [ACH per hypothesis]: "For hypothesis [H], what evidence
  confirms or disconfirms it? Rate each: C/I/N."
  → output: evidence matrix

Call N+1 [KAC]: "What assumptions does the top hypothesis depend on?
  Rate each assumption's confidence."
  → output: assumption list

Call N+2 [Devil's Advocacy]: "Build the strongest case AGAINST
  the top hypothesis."
  → output: adversarial critique

Call N+3 [Synthesis]: "Given all of the above, what is the most
  defensible conclusion? Where should a human reviewer focus?"
  → output: final judgment + review flags

Pro: Simple to implement; no multi-agent infrastructure needed.
Con: Self-consistency pressure — the same model processes all stages; later stages are influenced by earlier outputs, which may introduce subtle anchoring.

Pattern B: Parallel Independent Analysis + Adversarial Review

Multiple independent agents run on the same problem without seeing each other’s outputs; an adversarial agent then critiques all results.

[Agent A: Analyst]    ─┐
[Agent B: Analyst]    ─┼─→ [Agent C: Devil's Advocate] → [Synthesis Agent]
[Agent C: Skeptic]    ─┘

Implementation:

Run Agent A (primary analysis, ACH)
Run Agent B (independent ACH — same task, same data, no shared context)
Run Adversarial Agent — receives A’s and B’s conclusions; tasked with maximum critique
Synthesis Agent — receives all three; produces final judgment with explicit disagreement flags

Pro: Counters Groupthink and self-consistency anchoring.
Con: 3–4× compute cost; requires orchestration layer (LangChain, LangGraph, CrewAI, etc.).

Key requirement: Agents A and B must have no shared context for their analyses. If they see each other’s outputs before generating, Pattern B degrades to Pattern A.

Pattern C: Post-Hoc Audit (KAC + Devil’s Advocacy on Finished Output)

For cases where you already have an output (e.g., a drafted report, a model recommendation) and want to apply SATs retrospectively:

Input: finished output

Call 1 [KAC]: "List every assumption in this analysis. Rate confidence
  of each. What would make each assumption false?"

Call 2 [Devil's Advocacy]: "Build the strongest case that this
  analysis is wrong or incomplete."

Call 3 [Quality Check]: "What sources support the claims in this
  analysis? What claims are unsupported? What's missing?"

This matches Roberts’ Key Assumptions Check implementation (applied to a finished Strider intelligence report). It’s the lowest-friction pipeline entry point — requires no upfront process change, just retrospective review.

Failure Modes of the Pipeline Itself

Failure Mode	Cause	Mitigation
Analysis paralysis	Too many SAT stages; the process becomes the product	Timebox each stage; use minimal viable pipeline
Anchoring on Stage 1	Early Starbursting/Brainstorming output shapes all downstream reasoning	Regenerate hypotheses independently before ACH; don’t show Stage 1 output to ACH agent
False confidence from process	”We ran the SAT pipeline, therefore the conclusion is sound”	The pipeline reduces bias — it doesn’t eliminate it; human review of the adversarial critique is non-negotiable
Sycophantic devil’s advocacy	The adversarial agent produces mild critique rather than genuine challenge	Explicit system prompt: “Your critique should be as strong as possible. Do not hedge.” Different model or temperature from primary agent
Cross-chunk context loss	Long documents chunked for KAC lose cross-document context	Summarize full document before chunking; run a final consolidation pass (see Roberts: LLM SATs FTW (2025))

Mapping Pipelines to System 2

The pipeline is a System 2 scaffold:

System 1 (LLM default): single-prompt completion → fast, fluent, biased
System 2 (SAT pipeline): structured multi-step process → slower, more effortful, less biased

Just as humans default to System 1 unless explicitly required to engage System 2, LLMs default to single-step completion unless prompted into multi-step structured reasoning. The pipeline is the mechanism for that prompting.

sat-pipeline