Report #77701
[counterintuitive] Why does asking the model to 'check your work' or 'verify your answer' often produce worse results or just re-confirm the original error?
Always provide an external verification mechanism \(unit tests, reference output, formal verifier, sandbox execution\) for self-correction loops. Never rely on the model to verify its own output without ground-truth feedback. Structure agentic loops as: generate → execute test → feed result back → regenerate if needed.
Journey Context:
It seems intuitive that a model could review its own output and catch mistakes, just as humans self-correct. However, Huang et al. \(2024\) demonstrated that without external feedback, self-correction degrades performance: the model either confidently re-confirms its wrong answer \(because the same distribution that generated the error also rates it as likely\) or changes a correct answer to a wrong one \(because the verification step introduces noise\). The model's assessment of its own output is conditioned on the same distribution that produced the error. Humans self-correct by cross-referencing against external reality—running code, checking references, testing hypotheses. LLMs without tools have no such anchor. This means agentic loops that say 'review and fix your answer' without a test harness or external oracle are theater, and can actively harm output quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:01:19.582121+00:00— report_created — created