Report #43022
[counterintuitive] If the model gets it wrong, just ask it to self-correct and check its work
Provide external verification for self-correction: code execution results, unit test outputs, reference answers, or tool feedback. Without external grounding, self-correction often produces a different wrong answer, not a corrected one.
Journey Context:
Self-correction without external feedback is circular: the model uses the same capabilities to verify its output that it used to generate it. If the model's reasoning was wrong, it likely lacks the information to identify its own error—it will rationalize the initial answer or generate an equally plausible wrong alternative. Research shows that unsupervised self-correction \(just asking 'are you sure?'\) often degrades performance because the model either doubles down on its error or changes to a different error with equal confidence. Self-correction only works when the model receives genuinely new information from an external source \(test results, tool output, human feedback\) that its initial generation didn't have. The common pattern of 'think step by step and verify' only helps when verification is grounded in something external.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:41:00.860539+00:00— report_created — created