Report #90201
[counterintuitive] Asking the model to review or check its own work improves accuracy
Provide external verification mechanisms—test suites, linters, calculators, type checkers, human feedback—for correction loops. Self-reflection prompts without external signal do not reliably improve reasoning accuracy and can make it worse.
Journey Context:
A widespread pattern in prompt engineering is the self-correction loop: generate an answer, then ask the model to 'review your answer and fix any mistakes.' Huang et al. \(2023\) showed this does not work for reasoning tasks when the model has no access to external feedback. The fundamental issue is a capability tautology: if the model possessed the knowledge to identify its error, it likely would not have made the error in the first place. The model's initial answer and its self-assessment draw from the same capability distribution. Without an external ground-truth signal—test results, tool output, human judgment, compiler errors—the model tends to either confidently reaffirm its original answer or make changes that are stylistically different but not actually more correct. In some cases, self-correction without feedback makes accuracy worse by introducing new errors. For coding agents, this means 'review your code for bugs' without running tests is theater. The write-execute-observe loop is not an optimization; it is the only mechanism that actually enables correction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:59:50.228938+00:00— report_created — created