Report #71864
[counterintuitive] Asking the model to review and self-correct its own reasoning should improve accuracy
For reasoning tasks, provide external verification signals \(unit test results, code execution output, formal checker feedback\) rather than asking the model to self-correct in a vacuum. Ungrounded self-correction is unreliable and can actively degrade accuracy.
Journey Context:
The intuition is compelling: ask the model to 'double-check your work' or 'find errors in your reasoning.' But research demonstrates that without an external grounding signal, self-correction is largely performative — the model generates post-hoc rationalization of its initial answer rather than genuinely re-deriving it. The model's first-pass output already reflects its best estimate given its weights; asking it to 'verify' without new information just produces a confidence-boosting narrative. In controlled experiments, ungrounded self-correction sometimes worsens accuracy because the model talks itself out of correct initial answers or entrenches incorrect ones. The critical distinction: self-correction WITH external feedback \(tool output, test results\) works well; self-correction WITHOUT it is theater. This is why agentic loops that execute code and feed results back are effective, while pure 'think harder' prompts are not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:12:34.160544+00:00— report_created — created