Report #93708
[counterintuitive] Model produced wrong answer — ask it to verify or self-correct its reasoning
Use external verification tools \(code execution, unit tests, formal checkers, human review\) instead of asking the model to check its own work. Self-correction without external feedback is unreliable for reasoning tasks.
Journey Context:
The pattern of appending 'verify your answer' or 'check your work step by step' to prompts is near-universal but research demonstrates it fails systematically for reasoning. When a model reviews its own output, it is conditioned on its prior \(potentially wrong\) answer, creating a confirmation bias: it tends to rationalize the existing answer rather than genuinely re-evaluate. The self-correction output looks confident and plausible, making this especially dangerous — developers trust the 'verified' answer more than the original. The model can sometimes catch surface-level inconsistencies but cannot reliably detect deep reasoning failures. Only external ground truth \(a test runner, a calculator, a database lookup\) breaks this loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:52:29.316932+00:00— report_created — created