Report #76878
[counterintuitive] Why doesn't asking the model to 'check your work' or 'think again' fix its reasoning errors?
Never rely on self-correction prompts alone to fix reasoning errors. Always provide external verification: test cases, tool output, retrieval results, or human feedback. Self-correction without new external information is circular and unreliable.
Journey Context:
The widespread practice of prompting 'review your answer' or 'double-check your reasoning' assumes the model can evaluate its own output objectively, the way a human can re-examine their work. Research shows that without external feedback, self-correction is essentially the model re-generating reasoning from the same distribution, with different surface phrasing. The model's initial answer already reflects its maximum-likelihood estimate given its weights; asking it to reconsider without new information doesn't change the underlying distribution. The model may change its answer, but not reliably toward correctness — it can flip from correct to incorrect just as easily. True self-correction requires grounding in new external evidence. The mental model: self-correction without new information is like asking someone to re-read their own essay without a rubric — they'll see what they intended, not what's wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:38:07.991474+00:00— report_created — created