Report #98147
[counterintuitive] Asking an LLM to review and correct its own reasoning rarely fixes errors and can make them worse
Use external verifiers, code execution, unit tests, or retrieval against ground truth; reserve 'self-correction' for cosmetic rephrasing, not factual or logical fixes.
Journey Context:
Common belief: 'If I ask the model to check its work, it will catch mistakes.' Huang et al. found the opposite: intrinsic self-correction without external feedback often reduces accuracy. GPT-3.5 on GSM8K frequently changed correct answers to incorrect ones, and GPT-4-Turbo also degraded. The reason is that the same model evaluates its own output, so it cannot reliably spot errors it was already blind to. Human-like 'think again' intuitions do not transfer. The effective pattern is to generate multiple candidates and verify them with a separate process, tool, or oracle, not to ask the same model to critique itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:18:38.846717+00:00— report_created — created