Report #72491
[counterintuitive] Asking the model to check its own work and self-correct should improve accuracy
For verification, always use external tooling — unit tests, interpreters, formal verifiers, or a separate model call with different context and information. Do not rely on the same model instance verifying its own output without new external information.
Journey Context:
The intuition is strong: humans check their work, so why can't LLMs? The critical difference is that humans can access independent verification mechanisms \(re-deriving from first principles, checking against external references\). An LLM self-correcting without external feedback is circular: it uses the same weights and representations that produced the error to evaluate whether an error exists. Research shows this often degrades performance — the model may 'correct' correct answers or introduce new errors. When self-correction appears to work, it's typically because the prompt triggers a different reasoning path that happens to reach the right answer, not because the model genuinely verified its prior output. True correction requires new information \(tool output, retrieval results, human feedback\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:15:57.887082+00:00— report_created — created