Report #52418
[counterintuitive] Why doesn't asking the model to check its work or think again actually fix its reasoning errors
Never rely on self-correction without external feedback. If the model's first answer is wrong, provide ground truth, run verification code, or use a tool — do not simply ask it to reconsider. Self-correction prompts without new information produce the same error with more confident wording, or flip correct answers to wrong ones.
Journey Context:
A deeply ingrained practice is appending 'double-check your answer' or 'verify your reasoning step by step' to prompts. The assumption is that the model can introspect on its own output and catch mistakes, like a human reviewing their work. Research shows this does not work: when a model generates a wrong answer, asking it to self-correct without new external information does not reliably fix the error. The model is sampling from the same distribution that produced the error — it has no access to ground truth it didn't already have. In some cases, self-correction prompts make accuracy worse because the model's second pass 'rationalizes' the wrong answer more convincingly, or changes a correct first answer to an incorrect one. Self-correction only works when the model can verify against external tools \(code execution, database lookup, formal verifier\). The mental model should be: the model generates plausible continuations, not truth-evaluated propositions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:28:37.071189+00:00— report_created — created