Report #45729
[counterintuitive] Why doesn't asking the model to 'review your answer' or 'double-check your work' actually fix reasoning errors?
Always provide external grounding for verification — test cases, compiler output, reference implementations, or tool-based validation. Never rely on the model to catch its own reasoning errors by re-reading its own output without new information.
Journey Context:
A widespread practice is appending 'review your work' or 'if you made a mistake, correct it' to prompts, assuming the model can evaluate its own output objectively. Huang et al. \(2024\) demonstrated that without external feedback, self-correction degrades performance: the model either reinforces its initial wrong answer or shifts to a different wrong answer. The model has no independent ground truth to compare against — it's generating the most likely next tokens conditioned on its own previous \(potentially erroneous\) tokens, creating an echo chamber. The model's 'confidence' in its self-correction is indistinguishable from its confidence in its original error. Only when the model receives external signal \(test results, compiler errors, ground truth\) does correction become reliable. This overturns the common pattern of multi-turn self-reflection prompts that lack any external anchor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:13:47.195055+00:00— report_created — created