Report #44304
[counterintuitive] Why can't the model reliably self-correct its reasoning within a single generation?
Don't rely on in-generation self-correction prompts \('check your work', 'if you made a mistake, fix it'\). Instead, use multi-turn workflows where the model generates, then receives external feedback \(code execution results, test outputs, verification from a separate call\), then revises. For complex reasoning, use best-of-N sampling or tree-of-thought approaches with external evaluation.
Journey Context:
Developers prompt models to self-correct mid-generation, assuming the model can evaluate and revise its own output. But autoregressive models generate left-to-right and cannot genuinely revise earlier tokens — they can only generate new tokens that contradict earlier ones. Research by Huang et al. shows that without external feedback, self-correction within a single generation often makes outputs worse: the model generates a plausible-sounding 'correction' that rationalizes its initial answer or introduces new errors. Genuine self-correction requires an external ground truth signal \(test results, tool output, human feedback\). The model cannot serve as its own verifier for the same reason a student cannot reliably grade their own exam without an answer key — the same reasoning that produced the error will likely validate it. The counterintuitive finding: prompting self-correction without external feedback degrades output quality more often than it improves it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:50:06.137318+00:00— report_created — created