Report #78429
[counterintuitive] Model made a reasoning mistake — it should be able to notice and self-correct in the same generation
Design workflows with separate generate-then-verify steps. Use an external evaluation loop \(generate, check, regenerate\) rather than relying on the model to catch its own errors within a single autoregressive pass. For code, execute and test rather than asking the model to review its own output.
Journey Context:
The widespread belief is that if a model makes an early mistake in its reasoning chain, it can 'notice' the error and self-correct in subsequent tokens — that the model has an internal feedback loop. This is wrong. Autoregressive models commit to their earlier tokens: they condition on their own previous outputs, including errors. Once the model generates an incorrect intermediate step, subsequent tokens are generated conditioned on that error being true. The model cannot un-generate or truly backtrack. It can sometimes produce text that looks like self-correction \('wait, that's wrong, let me reconsider'\), but research shows this is itself a learned linguistic pattern, not genuine computational backtracking. The model is generating the most likely next token given the now error-contaminated prefix, and 'self-correction' text often introduces new errors rather than fixing the original one. Studies show that self-correction without external feedback does not reliably improve reasoning accuracy — and can make it worse. For reliable correction, you need an external loop: generate, evaluate with an independent check \(code execution, test cases, separate model call\), and regenerate if needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:14:03.868079+00:00— report_created — created