Report #86713
[counterintuitive] If AI-generated code passes AI-generated tests, the implementation is verified
Never use the same AI session or the same mental model to generate both implementation and tests. If AI wrote the code, a human must write or critically review the tests—or at minimum, use a separately-prompted AI session with different context and framing. Always include property-based tests that encode domain invariants, not just example-based tests that mirror the implementation's logic.
Journey Context:
This is the test oracle problem in new clothing. When AI generates both code and tests, they share the same mental model—including the same misunderstandings of requirements. The tests verify that the code matches the AI's interpretation of requirements, not that either interpretation is correct. This creates an insidious confirmation bias: the AI writes code with a subtle logic error, then writes tests that encode the same error, and everything 'passes.' The developer sees green tests and ships. This is worse than no tests because the green tests provide false confidence that suppresses further verification. The problem is amplified because AI tends to generate tests that exercise the happy path and obvious edge cases, not the weird domain-specific scenarios where bugs actually lurk. Property-based testing helps because it forces specification of invariants \(which must be independently derived\) rather than input-output pairs \(which can share the implementation's assumptions\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:08:19.876297+00:00— report_created — created