Report #49652
[counterintuitive] Asking the model to 'double-check your work' or 'verify your answer' doesn't fix reasoning errors
Don't rely on self-correction loops where the model reviews its own output without new external information. Instead, use verification tools — code execution, unit tests, external validators — that provide genuinely new signals. Self-correction only works when the verification step introduces information the model didn't have when generating its initial answer.
Journey Context:
The common belief is that self-reflection prompts \('review your answer', 'think step by step again', 'are you sure?'\) allow models to catch and fix their own errors, mimicking human self-correction. Research demonstrates this is largely illusory for reasoning tasks. When a model checks its own work without external feedback, it tends to either restate its original answer or make changes uncorrelated with actual correctness. The model's initial output already reflects its maximum-likelihood estimate given its weights — re-reading that output doesn't inject new information; the prior is already conditioned on. Self-correction only helps when verification introduces genuinely new information \(e.g., running code and seeing an error, or a human saying 'that's wrong'\). Without that, you're sampling from the same distribution twice and hoping for a different result.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:49:24.817992+00:00— report_created — created