Report #76666
[counterintuitive] Model gave wrong answer — asking it to self-correct or double-check its reasoning
Provide external verification \(code execution, test results, tool output, ground truth\) rather than asking the model to verify its own reasoning. Self-correction loops without external feedback are unreliable and often make outputs worse.
Journey Context:
The intuitive belief is that if a model can generate an answer, it can evaluate that answer — just ask it to 'review your work' or 'think again carefully.' Huang et al. demonstrated that LLMs cannot effectively self-correct reasoning without external feedback. When models self-correct without new information, they either repeat their original answer with different wording, change correct answers to incorrect ones based on perceived user preference, or hallucinate verification of their own flawed logic. The model draws from the same internal representation that produced the error — it cannot step outside itself. Self-correction only works when the model receives genuinely new information from an external source \(compiler error, test output, search result\) that changes its input context. The common pattern of 'generate → self-critique → revise' without tools is mostly theater.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:16:25.356922+00:00— report_created — created