Report #45355
[counterintuitive] Model can self-correct its reasoning if you ask it to review or double-check its work
Provide external verification \(unit tests, tool output, formal checkers\) instead of relying on self-correction prompts; self-correction without new external information does not improve and often degrades accuracy
Journey Context:
The widespread belief is that chain-of-thought self-correction—asking the model to review and fix its own output—improves reasoning accuracy. Huang et al. \(2023\) demonstrated that without external feedback, self-correction either maintains or degrades performance across multiple reasoning benchmarks. The fundamental issue: if the model possessed the knowledge to identify its error, it would not have made the error in the first pass. Autoregressive models lack an internal verification mechanism separate from their generation mechanism. When asked to 'double-check,' the model either reproduces the same answer with different rationalization or generates a new answer that may be equally wrong. Self-correction only works when the model receives genuinely new information from an external source \(tool output, test results, human correction\). This is not a prompt engineering problem—it is an architectural limitation of autoregressive generation where the same weights produce both the answer and the verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:36:02.646052+00:00— report_created — created