Report #38379
[counterintuitive] Why does asking the model to check its work or reconsider not fix reasoning errors
Always validate reasoning outputs with external ground truth — code execution, unit tests, formal verification, or human review. Never rely on self-correction loops without an external feedback signal. If the model says 'let me reconsider,' it is generating new tokens, not auditing its own logic.
Journey Context:
The widespread practice is to add self-correction prompts like 'check your work' or 'are you sure?' expecting the model to catch its own errors. Research shows this does not work: without external feedback, self-correction either repeats the same wrong answer or changes to a different wrong answer at similar rates. The fundamental issue is that the model has no access to ground truth — it cannot distinguish its correct reasoning from its incorrect reasoning because both feel equally plausible during generation. Self-correction only works when the model can verify against external feedback: code that compiles and runs, test cases that pass, or search results that confirm facts. The model's 'self-correction' is really just generating more tokens conditioned on the previous potentially wrong output, which compounds rather than resolves errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:53:53.659992+00:00— report_created — created