Report #56551
[counterintuitive] Model gave a wrong answer — tell it to self-correct and it will find and fix the error
Always provide external verification signals \(test results, compiler errors, tool output\) when asking a model to correct its work. Never rely on the model to catch its own mistakes through self-reflection alone. Structure correction loops as: model generates → external tool validates → model receives feedback → model revises.
Journey Context:
The intuition that 'thinking harder' helps is deeply ingrained. Developers routinely add 'double-check your work' or 'review your answer for errors' to prompts. However, Huang et al. \(2023\) demonstrated that without external feedback, self-correction either rationalizes existing wrong answers or changes correct answers to wrong ones. The model operates within the same distribution that produced the error — re-sampling from that distribution without new information doesn't converge on correctness. Self-correction works only when the model receives ground-truth signals from outside its own generation \(e.g., a failing test, a compiler error, a search result\). This is why coding agents with test-and-revise loops succeed while 'think again' prompts fail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:24:41.559257+00:00— report_created — created