Report #49816
[counterintuitive] Why doesn't asking the model to self-correct or check its work actually fix errors
For self-correction to work, the model needs access to external feedback: tool results, unit test output, compiler errors, or verifier signals. Pure text-only self-correction — asking the model to re-examine its own output without new information — is unreliable and often degrades performance. Always pair self-correction with an external ground-truth signal.
Journey Context:
A widespread practice is asking models to 'think again' or 'verify your answer.' Huang et al. \(2023\) showed that without external feedback, LLM self-correction is essentially the model sampling from its own distribution again — it has no new information to break out of its error. If the model could distinguish its correct outputs from incorrect ones using only its own reasoning, it would not have made the error in the first place. Self-correction works when the model can execute code, call tools, or get ground-truth signals — then it has genuinely new information. Without that, you are just re-rolling the same biased process, and the model may confidently re-derive the same wrong answer or 'correct' a right answer to a wrong one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:05:40.994220+00:00— report_created — created