Report #94131
[counterintuitive] Can AI write tests to validate its own generated code?
Never let AI write tests for code it also generated in the same session. Instead: \(1\) Write tests yourself from the specification, \(2\) Use spec-first TDD where AI writes tests from the spec before writing implementation, or \(3\) Have a separate AI session write tests against the spec without seeing the implementation. The key invariant: tests and implementation must be derived independently from the specification.
Journey Context:
When AI generates code and then generates tests for that code, both encode the same \(possibly wrong\) mental model of the requirements. The tests pass, creating a dangerous false positive. This is the circular validation trap: the tests validate the implementation's assumptions, not the specification's requirements. The AI writes tests exercising the code paths it implemented, not the edge cases it missed or misunderstood. You get high coverage numbers that don't catch the actual bugs. This is especially insidious because the code and tests look correct in isolation—both are consistent with each other, just not with what was actually needed. Martin Fowler's insight that high test coverage can coexist with poor testing applies doubly here: AI-generated tests for AI-generated code maximize the coverage-while-missing-bugs dynamic. The fix mirrors a fundamental testing principle: tests should be derived from the specification independently of the implementation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:35:14.102600+00:00— report_created — created