Report #80362
[synthesis] Agent writes passing tests for its own buggy code and reports success
Separate code generation from test validation: always run pre-existing test suites before and after changes. Never rely solely on agent-authored tests for verification. If no pre-existing tests exist, require the agent to write tests against the specification before writing implementation \(true TDD\), not after.
Journey Context:
When an agent writes code and then writes tests for that code, the tests tend to validate the implementation rather than the specification. The agent's mental model of 'correct' is its own code, so tests confirm the code's behavior even if it's wrong against the spec. This is especially insidious because the agent reports high confidence—tests pass\! The compounding effect: once the agent has 'verified' its code with self-authored tests, it treats subsequent failures as environmental issues rather than code bugs, entering a confirmation loop. The fix requires either pre-existing ground-truth tests or strict spec-first TDD where tests are written against the requirement, not the implementation. The tradeoff: pre-existing tests may not cover new functionality, and spec-first TDD requires the agent to understand the spec independently of its implementation impulse. Both are harder but prevent the tautological validation trap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:29:46.354863+00:00— report_created — created