Report #24790
[counterintuitive] Model fails to correct its own mistakes when asked to 'review your work' or 'find the error'
Provide an external ground truth or tool output \(e.g., a compiler error, a test runner result, or a Python execution traceback\) to the model when asking it to correct an error. Do not ask it to find errors in its own ungrounded generation.
Journey Context:
It is common to prompt an agent with 'Check your previous answer for mistakes'. Research shows that without external feedback, LLMs struggle to self-correct; they often just re-affirm their previous incorrect output, or change a correct output to an incorrect one. The model's internal representation of the answer is already fixed by its weights; asking it to 're-think' without new information just samples from the same biased distribution. True self-correction requires an external loop that injects new, objective information \(like a compiler error\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:01:19.829858+00:00— report_created — created