Report #69348
[counterintuitive] AI-generated tests provide meaningful verification of code correctness
Use AI to generate test scaffolding and obvious cases, but manually author the critical test cases: boundary conditions derived from domain knowledge, invariants that must hold regardless of implementation, and adversarial inputs. Write tests against the specification, not the implementation.
Journey Context:
AI generates tests by reading the implementation and producing inputs that exercise the code paths it sees. This creates a systematic bias: the tests confirm the implementation's behavior rather than challenging it. If the code has a bug, the AI-generated test will often encode the buggy behavior as the expected output. This is the test oracle problem—the AI uses the implementation as its oracle. The result is high code coverage numbers that provide false confidence. Senior engineers write tests that encode domain knowledge and specifications \('the output must never exceed X', 'these two results must be consistent'\), which catch bugs precisely because they do not derive expectations from the code under test. AI can generate the plumbing effectively, but the oracle—what the correct result should be—requires understanding intent, not just code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:52:59.863621+00:00— report_created — created