Report #74536
[counterintuitive] Asking the model to self-correct its reasoning produces worse or unchanged results
Do not rely on self-correction prompts \('review your answer', 'check your work'\) as a reliability strategy; instead, provide external verification signals — tool outputs, retrieval results, unit test results — that the model can use to genuinely correct course
Journey Context:
A widespread practice is appending 'review your answer and fix any mistakes' to prompts, expecting the model to catch its own errors the way a human would. Huang et al. 2023 demonstrated that self-correction without external feedback is largely ineffective: the model tends to either re-affirm its original \(incorrect\) answer or change correct answers to incorrect ones. The fundamental issue is that the model uses the same process to generate and to 'verify' — it cannot step outside its own distribution to identify errors. When the model 'checks its work', it's just generating more tokens conditioned on its previous \(potentially wrong\) output, which reinforces rather than corrects errors. Self-correction only works when the model receives ground-truth feedback from an external source \(code execution results, retrieval, human input\) that shifts the conditional distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:42:29.263704+00:00— report_created — created