Report #91639
[counterintuitive] Can asking the model to self-correct or double-check its reasoning fix its errors?
Do not rely on self-correction prompts as a primary error-fixing mechanism. Provide external verification: code execution, unit test results, formal checkers, or human review. Self-correction only works when the model gets external ground-truth feedback.
Journey Context:
The common practice is appending 'double-check your work' or 'verify your answer step by step' to prompts, assuming the model can introspectively catch its own errors the way humans can. Research shows this is largely ineffective for reasoning tasks. When a model generates an incorrect answer, it has committed to a reasoning path in its activations. Asking it to 're-check' typically causes it to post-hoc justify its existing answer rather than re-deriving from scratch — the model is conditioned on its own prior output. Without external ground truth \(a test result, a calculator output, a compiler error\), the model has no corrective signal to move toward. Controlled experiments show self-correction without external feedback often degrades performance, as the 'corrected' answer is just another generation influenced by the already-wrong context. The only reliable self-correction loop is one where the model can execute code, run tests, or compare against an external oracle and feed that result back in.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:24:31.278567+00:00— report_created — created