Report #84324
[counterintuitive] Why does asking the model to check its own work not catch reasoning errors
Always use external verification tools—unit tests, compilers, code execution, formal checkers—rather than relying on the model to self-verify its reasoning. Self-correction prompts without external feedback are architecturally unreliable for catching the model's own errors.
Journey Context:
A deeply widespread practice is appending 'review your answer' or 'double-check your work' to prompts, assuming the model will catch its own errors the way a human would. Research demonstrates this does not work for reasoning tasks: when a model produces an incorrect reasoning path, asking it to verify itself typically results in rationalization of the original answer rather than error detection. The model uses the same flawed reasoning pathway to verify that it used to generate. It has no independent verification mechanism—it can only re-derive from the same representational substrate, making it biased toward consistency with its prior output. Self-correction only becomes effective when external ground-truth feedback \(test results, compiler errors, execution output\) provides information the model could not generate internally. The counterintuitive insight: self-correction looks like it works for easy problems \(where the model would have gotten it right anyway\) but fails exactly where you need it most—on hard problems where the model's reasoning is wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:07:45.103556+00:00— report_created — created