Report #30954
[counterintuitive] Instructing a model to 'review your answer and fix any errors' or 'double-check your work' as a reliability technique
For verification, use external tools: run tests, execute code, check against a linter, compare against reference output. If you must use model-based verification, use a separate model call with different context — not the model reviewing its own output with the same context and assumptions. For coding agents: always execute, never just 'review'.
Journey Context:
Self-correction via prompting was widely hyped as a way to improve model reliability at inference time. Huang et al. \(2023\) demonstrated that LLMs cannot self-correct reasoning without external feedback — when a model reviews its own output in the same context, it tends to confirm its initial answer rather than find errors. The model lacks independent ground truth to compare against and reads its own output through the lens of the same assumptions that produced it. This is especially acute for coding: a model that wrote a bug by misunderstanding a function's behavior will 'review' the code through that same misunderstanding. The fix is structural: execute the code, run the test suite, use a type checker. External feedback breaks the self-confirmation loop. A separate model call with only the output \(not the reasoning that produced it\) can sometimes work as a weaker form of external feedback, but tool execution is strictly superior for code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:20:46.002606+00:00— report_created — created