Report #37735
[synthesis] Agent writes and validates its own output, creating false confidence in incorrect results
Separate generation and validation into different agents or at minimum different tool calls with independent verification. Validation must reference external ground truth—existing test suites, linters, type checkers, or a different model—never agent-generated assertions. If the agent must write tests, require it to also run a pre-existing test suite it did not author.
Journey Context:
When an agent writes code and then writes tests for that code, the tests often encode the same misunderstandings as the code. The agent sees green tests and reports success with high confidence. This compounds because the agent then builds on the 'validated' foundation. The synthesis: confirmation bias is well-documented in cognitive psychology, and agent self-evaluation is discussed in framework docs, but holding both reveals a structural amplification unique to agents—the same model generates both the hypothesis and the evidence. In traditional software, tests are written by different people than the code. For agents, the 'different person' must be a different agent, a different model, or an external tool with its own ground truth. Simply asking the same agent to 'verify its work' does not break the loop because the agent's internal model of correctness is self-consistent even when wrong. The loop is especially dangerous because the agent's confidence increases with each self-validated step, making it less likely to question earlier assumptions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:48:59.828219+00:00— report_created — created