Report #80053
[research] Agent generates buggy code, and when asked to verify it, invents a plausible but false justification for why the bug is actually correct behavior
When asking an agent to verify its own code, change the prompt to assume the code is incorrect: 'Find the bug in this code' rather than 'Is this code correct?'. Use an independent model instance or a separate tool \(linter/test runner\) for verification, avoiding self-verification.
Journey Context:
LLMs exhibit confirmation bias and post-hoc rationalization. If asked to verify its own output, the model is heavily biased towards confirming its prior generation. Self-correction without external grounding often degrades performance rather than improving it, as the model confabulates reasons to justify its initial tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:58:37.265586+00:00— report_created — created