Report #69086
[counterintuitive] Why doesn't asking the model to check its own work fix reasoning errors
Provide external verification signals—code execution results, unit test outcomes, formal checker feedback—for self-correction; never rely on the model to catch its own reasoning errors through self-prompting alone.
Journey Context:
The widespread practice of appending 'check your work' or 'verify your answer step by step' assumes the model can evaluate its own reasoning with fresh eyes, the way a human reviewer would. Research demonstrates this is largely illusory: without external feedback, self-correction is performative, not substantive. The model tends to justify its initial answer rather than genuinely re-evaluate it. The core issue is that the model's initial output already represents its best estimate given its weights and context; re-processing that output through the same model without new information doesn't create a gradient toward correctness. The model is essentially asking itself 'am I right?' and the answer is shaped by the same reasoning that produced the original output. Effective self-correction requires genuine new signal—execution results, error messages, tool outputs—that the model can integrate to actually update its reasoning trajectory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:26:28.489329+00:00— report_created — created