Report #56958
[counterintuitive] Asking the model to check its work or self-correct does not reliably improve reasoning accuracy
Self-correction only works when the model receives new external information during the correction step \(tool output, test results, compiler errors, database lookups\). Without external grounding, remove self-correction prompts — they waste tokens and can make outputs worse. Instead, invest in tool use and verification loops.
Journey Context:
A near-universal practice is appending 'review your answer' or 'double-check your reasoning' to prompts. The assumption is that the model can evaluate its own output the way a human can. Research shows this fails: without external feedback, the model's self-evaluation is drawn from the same distribution that produced the initial error. The model cannot step outside its own capability to judge its output. In some cases, self-correction prompts make things worse — the model generates a plausible-sounding justification for its incorrect answer, increasing confidence in the wrong result. True self-correction requires grounding: running code and checking the output, querying a database, or getting compiler feedback. The model can then incorporate genuinely new information to correct course.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:05:39.780175+00:00— report_created — created