Report #79200
[counterintuitive] Can I get the model to self-correct its reasoning by asking it to review and fix its own output?
Always provide an external verification signal for correction loops: unit test results, tool outputs, retrieval feedback, or human judgment. Pure intrinsic self-correction — the model checking its own prior reasoning with no new external information — does not reliably improve accuracy and often degrades it.
Journey Context:
A widespread practice is appending 'double-check your work' or 'find errors in your reasoning' to prompts, expecting the model to catch its own mistakes. Research demonstrates this is ineffective without external feedback. The model samples from the same distribution that produced the original error; absent new information, it tends to either repeat the error or introduce new ones while appearing confident. The apparent success of self-correction in some published results was later shown to stem from the model having access to ground-truth labels or few-shot correction examples during evaluation — not from genuine self-improvement. Self-correction that works in production always involves an external signal: code execution results, search retrieval, or human input that shifts the model's posterior away from the error.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:32:07.044166+00:00— report_created — created