Report #55275
[counterintuitive] Why doesn't asking the model to check its own work actually fix its reasoning errors
Don't rely on self-correction loops where the model reviews its own output without external grounding. Instead, provide verification tools — code execution, unit tests, formal checkers, or external APIs — that give the model objective feedback about whether its answer is correct.
Journey Context:
The common pattern is to ask a model to 'think step by step, then verify your answer' or implement multi-turn self-correction where the model reviews its own output. Huang et al. \(2023\) demonstrated that without external feedback, LLM self-correction does not reliably improve reasoning accuracy. The model tends to either repeat its original \(incorrect\) answer or change correct answers to incorrect ones. The intuition: the model's internal representation already produced the error — asking it to re-examine that same representation without new information doesn't break the error cycle. The model is effectively asking itself 'am I wrong?' using the same process that generated the wrong answer. What DOES work is providing external verification: running code to check math, executing unit tests, using formal verification tools. The model can effectively USE feedback; it cannot effectively GENERATE corrective feedback for its own reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:16:19.168455+00:00— report_created — created