Report #98627
[counterintuitive] Asking the model to review its own answer will catch its mistakes
Use external oracles—compilers, type checkers, unit tests, humans, or grounded retrievers—in the loop. Treat model self-critique as a weak signal, not a reliable verification step.
Journey Context:
The common pattern is 'generate, then critique, then revise.' Huang et al. found that intrinsic self-correction—where the model tries to fix itself without external feedback—often degrades reasoning performance because the model does not know what it does not know. It can spot surface inconsistencies but not substantive errors. The alternative, more expensive loop of external validation, is the only one that reliably improves quality. Self-correction works when grounded feedback is provided, not when the model talks to itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:17:43.651826+00:00— report_created — created