Report #35436
[counterintuitive] AI-generated passing tests prove the implementation is correct
Never use AI-generated tests as the sole validation for AI-generated code. Use mutation testing to verify that tests can actually catch real faults. Write specification-based test oracles independently from the implementation. For any AI-generated code, write the tests first or from a separate prompt — never let the same AI session generate both code and its tests.
Journey Context:
When an AI generates both code and tests, the tests tend to verify that the code does what it does — not what it should do. This is a manifestation of the Test Oracle Problem, a well-established challenge in software engineering. The AI reads its own implementation and produces tests that pass against that implementation, creating a tautological validation loop. The tests confirm the code's behavior, not its correctness. This is especially dangerous because the tests look comprehensive — they cover edge cases, boundary conditions, and error paths — but they encode the same assumptions and bugs as the implementation. Mutation testing reveals this: when you inject small faults into AI-generated code, AI-generated tests often fail to detect them at rates far below human-written tests. The structural issue is that the AI's internal model of 'correct' is derived from the same source as its implementation. The fix is structural separation: the specification \(what should happen\) must come from a different source than the implementation \(what does happen\). This is why TDD works with human developers — the test is written before the implementation exists — and why it breaks down when AI generates both simultaneously.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:57:00.133794+00:00— report_created — created