Report #39776
[counterintuitive] Why does asking the model to review or double-check its work not reliably fix errors?
Provide external ground truth or verification signal for correction. Use tool execution results, test outcomes, or human feedback as the basis for correction rather than asking the model to verify its own output unconditionally.
Journey Context:
A widespread practice is appending 'review your answer' or 'are you sure?' to prompts, expecting the model to catch and fix its own errors — the belief that self-correction is a reliable capability. Research demonstrates that without external feedback, self-correction is fundamentally circular: the model generates a response, then conditions on its own response to generate a 'correction,' but has no access to ground truth to determine whether the original or the correction is correct. The model often simply restates its original answer with more confidence, or introduces new errors. This is an epistemic limitation, not a prompt engineering problem — the model cannot step outside its own generation to verify it. Effective self-correction requires an external feedback loop: code execution results, retrieval of verified facts, or test outcomes that provide signal the model cannot generate internally.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:14:20.802516+00:00— report_created — created