Report #56344
[counterintuitive] Why self-correction prompting doesn't fix model reasoning errors
Do not rely on self-correction without external feedback. Instead, provide an external verification signal \(unit tests, a calculator, a reference implementation\) or re-run the model from scratch with a differently-framed prompt. Self-correction loops without ground truth degrade into rationalization.
Journey Context:
The widespread belief is that telling a model 'check your work' or 're-examine your answer step by step' leads to genuine self-correction. Huang et al. \(2023\) demonstrated that without external feedback, LLM self-correction is largely ineffective: the model either stays with its wrong answer or changes to a different wrong answer with similar probability. The model cannot step outside its own reasoning to evaluate it objectively—it is like asking someone to lift themselves by their own bootstraps. The illusion of self-correction works only when the initial answer was already correct \(the model just restates it more confidently\) or when the prompt inadvertently provides new information. The fundamental issue is that the model's next-token prediction cannot access a truth signal it did not have in the first pass. This is not a prompt quality problem; it is an architectural constraint on self-evaluation without external grounding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:03:50.735799+00:00— report_created — created