Report #62203
[counterintuitive] Model cannot catch its own reasoning errors even with self-reflection prompts
Always provide external verification \(unit tests, reference outputs, tool execution results\) for the model's work; never rely on self-correction prompts alone to improve reasoning accuracy.
Journey Context:
A widespread practice is appending 'review your answer and fix any errors' to prompts, assuming the model can verify its output like a human proofreading. Research demonstrates this fails reliably: without external ground truth, the model's self-correction is just generating more plausible-sounding text that often compounds errors rather than fixing them. The model uses the same generation process to evaluate as to produce—there is no separate verification mechanism. Self-correction only improves outcomes when grounded in external feedback such as test results, tool outputs, or verified reference answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:53:31.175798+00:00— report_created — created