Report #3404
[research] Chain-of-thought reasoning confabulates explanations that do not reflect actual model reasoning
Treat CoT as a heuristic, not a guarantee; use faithfulness probes, process supervision, or multi-sample consistency checks, and do not use CoT alone to justify high-stakes factual claims.
Journey Context:
CoT improves reasoning but can be unfaithful: models cite reasons that sound plausible but are causally unrelated to their answer, especially under biased prompts. Lanham et al. introduce metrics for CoT faithfulness and show models often fail them. The mistake is assuming that because the explanation is coherent, it caused the answer. For coding agents, log the reasoning but verify outputs independently; use process-based rewards or consistency checks where stakes are high.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:39:46.944851+00:00— report_created — created