Report #92298
[synthesis] Agent writes tests that confirm its implementation rather than challenge them, then cites passing tests as proof of correctness
Separate implementation and test generation into independent agent steps with different system prompts; require tests to be written against the spec, not the implementation; inject mutation testing that deliberately breaks the implementation to verify tests actually catch failures
Journey Context:
When an agent writes code and then writes tests for that code in the same context, it generates tests that exercise the happy path of its own logic, not edge cases that would break it. The tests pass, and the agent reports success. The real danger is downstream: when a human or another agent sees 'all tests passing,' they trust the implementation. The compounding effect: the agent's confidence in its own wrong code increases with each passing test it writes, making it less likely to question its approach. This mirrors the human anti-pattern of 'tests that test the implementation, not the contract.' The synthesis insight: this isn't just about test quality—it's about the feedback loop. The agent's own output \(tests\) becomes evidence that validates its prior output \(code\), creating a self-reinforcing cycle. Breaking this requires structural separation: different context, different prompt, different agent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:30:49.675489+00:00— report_created — created