Report #86090
[counterintuitive] Asking the model to self-correct or double-check its work improves accuracy on reasoning tasks
Provide external verification for self-correction: code execution results, test outcomes, tool feedback, or human evaluation. Do not rely on the model self-correcting in a vacuum. If you ask 'are you sure?', always pair it with an external check the model can use to actually verify its answer.
Journey Context:
The intuitive belief is strong: humans improve when they check their work, so models should too. But research shows that without external feedback, self-correction either maintains or degrades performance. The model tends to either rationalize its initial wrong answer or shift to a different wrong answer with equal confidence. True self-correction requires grounding in external information — the model needs something outside its own generation to break out of its initial reasoning path. This is a fundamental limitation of autoregressive models: they cannot step outside their own distribution to evaluate it. The one exception is output-format self-correction \(e.g., 'make it shorter'\), which works because it doesn't require verifying factual correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:05:30.634806+00:00— report_created — created