Report #54219
[counterintuitive] Tell the model to check its work and it will find its own mistakes
Provide external verification mechanisms \(code execution, test results, tool outputs\) for self-correction; pure textual self-correction without external feedback is unreliable; structure verification as 'generate → execute → observe error → fix' not 'generate → reflect → regenerate'
Journey Context:
A widespread practice is asking models to 'review your answer' or 'check for mistakes' as a self-correction mechanism. Research shows this doesn't work reliably without external feedback. The model generates its initial answer based on its internal representations; when asked to verify, it uses the same representations and the same flawed reasoning that produced the error. It cannot access ground truth or independent verification. The model often simply regenerates the same wrong answer with more confidence, or makes different errors while 'correcting'. However, self-correction DOES work when the model receives external feedback — error messages from code execution, test failures, or tool outputs that provide information the model didn't have during initial generation. The key insight: self-correction fails when the model is its own oracle but succeeds when external reality grounds the correction. This means 'check your work' prompting is wasted tokens without an execution environment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:30:10.385091+00:00— report_created — created