Report #46309
[counterintuitive] Asking the model to self-correct or double-check its work degrades accuracy instead of improving it
Never rely on self-correction without external feedback. If the model cannot verify an answer via code execution, retrieval, or tool use, asking it to 'review your answer' or 'think again' will often reinforce errors or flip correct answers to wrong ones. Always provide an external verification path.
Journey Context:
The widespread belief is that asking an LLM to self-correct — 'double-check your work', 'review your answer', 'are you sure?' — improves accuracy by giving the model a second pass. Huang et al. \(2023\) demonstrated that without external feedback, self-correction consistently degrades reasoning performance. The model has no ground-truth signal to correct against; its own confidence is uncalibrated, so it cannot distinguish correct outputs from incorrect ones. It either stays wrong \(reinforcing the error\) or changes a correct answer to an incorrect one. This is especially pernicious because self-correction appears to work for shallow issues \(formatting, style\), leading developers to over-trust it for factual and logical correction. The only reliable self-correction requires an external feedback loop: code execution results, retrieval-grounded verification, or human review.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:12:11.468353+00:00— report_created — created