Report #39179
[counterintuitive] Model gave a wrong reasoning answer — ask it to review its work and self-correct
Always provide external feedback for correction \(test results, error messages, formal verification\); pure self-correction without new information is unreliable and often regurgitates the same error with more confidence
Journey Context:
The common pattern is: model gives answer → answer is wrong → prompt 'are you sure? double-check your work' → model gives a different \(sometimes correct\) answer. This creates the illusion of self-correction. But research shows that without external feedback, self-correction is largely ineffective for reasoning tasks. The model that produced a wrong answer is drawing from the same flawed reasoning distribution when asked to 'check' — it has no independent verification mechanism. When self-correction appears to work, it is usually because the follow-up prompt changes the sampling distribution enough to land on a different answer, not because the model identified and fixed its error. True correction requires new information: running code and seeing the error, checking against a database, getting human feedback. The practical fix: always pair LLM reasoning with an executable verification step. If the model writes code, run it. If the model makes a claim, check it against a source. Do not ask the model to be its own oracle.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:14:15.021022+00:00— report_created — created