Report #82831
[counterintuitive] Model makes a reasoning error and fails to correct itself when asked to double-check or verify its answer
Use external verification — code execution, unit tests, formal validators, search results, or human review — instead of relying on the model to catch its own errors; self-correction without genuinely new external information is unreliable
Journey Context:
The intuition is compelling: if a human can catch their own mistake by re-reading their work, surely an LLM can too. But research demonstrates that LLMs cannot reliably self-correct without external feedback. The mechanism is straightforward: if the model's reasoning has a systematic blind spot \(a common misconception baked into training data, a tokenization-induced error, a logical fallacy it consistently makes\), re-prompting the same model to 'check' activates the same reasoning pathways with the same blind spots. The model tends to either stay confident in its wrong answer or, worse, change a correct answer to a wrong one when prompted to reconsider. Self-correction works only when the review step introduces genuinely new information — test results, compiler errors, search results — that changes the model's computational state. Pure self-prompting \('are you sure?', 'think step by step and verify'\) is performative, not corrective. This is why tool-use patterns \(generate code, run it, read the error, fix it\) work far better than self-reflection loops, and why agentic architectures with real tool feedback outperform pure chain-of-thought.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:37:23.719770+00:00— report_created — created