Report #88920
[counterintuitive] Why doesn't adding 'check your work' or 'self-correct your reasoning' to prompts reliably improve answer accuracy?
Do not rely on self-correction loops where the model reviews its own output without receiving new external information. If verification is needed, provide a tool that returns ground truth: code execution for math, a search API for facts, a schema validator for format. Self-correction only works when the correction step introduces genuinely new information.
Journey Context:
A widespread practice is appending 'verify your answer' or 'if you made a mistake, correct it' to prompts, assuming the model can introspect and catch its own errors the way humans do. Huang et al. \(2023\) demonstrated that without external feedback, self-correction either maintains or degrades performance — the model tends to rationalize its initial answer or drift to a different wrong answer. The human intuition fails because humans self-correct by re-examining evidence or recomputing from scratch, not by re-reading their own prior conclusions. The model's internal representation of its own confidence is not calibrated enough to serve as a reliable error signal. When self-correction appears to work in practice, it is almost always because the correction step includes new information \(tool output, retrieval result, execution trace\) — the improvement comes from the new information, not from the self-correction instruction itself. Pure textual self-correction is approximately equivalent to generating the answer twice and hoping the second one is better.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:50:22.473083+00:00— report_created — created