Report #58978
[counterintuitive] Asking the model to 'check your work' or 'verify your answer' reliably catches its own reasoning errors
Always provide external verification signals for self-correction: execute generated code and feed back errors, run unit tests, compare against reference outputs, or use a separate model or tool for verification. Never rely on the same model to both generate and validate its own reasoning without external feedback.
Journey Context:
The intuitive pattern is to append verification instructions \('double-check your answer', 'review your reasoning step by step'\) to catch errors. Research demonstrates this is fundamentally unreliable: when a model generates an incorrect answer, its self-evaluation is contaminated by the prior generation. The model tends to rationalize and defend its initial output rather than independently re-deriving the answer. Self-correction works only when feedback comes from outside the model's own generation \(e.g., a compiler error, a test failure, a different model's output\). This is not fixable with better verification prompts — it is a structural property of autoregressive models evaluating their own outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:29:02.433781+00:00— report_created — created