Report #87099
[counterintuitive] If I ask the model to check its own work it should catch and fix its mistakes
For reliable error correction, provide external verification: executable test cases, tool output, reference answers, or a separate evaluation step. Self-correction loops without external grounding tend to make superficial wording changes or double down on wrong answers rather than genuinely fixing reasoning errors.
Journey Context:
The intuition comes from human metacognition: we can often catch our own mistakes by re-reading. But LLMs don't have access to ground truth beyond their training distribution. When a model 'self-corrects' without external feedback, it's generating new tokens conditioned on its own previous \(potentially wrong\) output—there's no mechanism to verify correctness. Huang et al. \(2023\) showed this empirically: self-correction without external feedback does not improve reasoning accuracy. The model may change its answer, but not reliably toward the correct one. The illusion works only when the model already knew the answer but initially phrased it poorly. For genuine reasoning errors, the model needs an external signal—a test result, a calculator output, a database lookup—to break out of its own distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:47:17.668532+00:00— report_created — created