Report #95103
[counterintuitive] AI-generated tests with high coverage mean the code is well-tested
Supplement coverage metrics with mutation testing. AI-generated tests often exercise code paths without asserting meaningful invariants. A test that calls a function and checks it does not throw is coverage, not quality. Use mutation testing to reveal the gap between what your tests execute and what they actually verify.
Journey Context:
AI can rapidly generate tests that achieve high line and branch coverage, creating a dangerous false sense of security. The problem: AI-generated tests tend to exercise code paths without asserting meaningful properties. They test that code runs, not that it is correct. This amplifies the known coverage-quality gap: coverage measures what code was executed, not whether the tests would catch bugs. Mutation testing—intentionally introducing small semantic changes and checking if tests catch them—consistently reveals that AI-generated tests with high coverage have low mutation kill rates. The coverage number looks great; the actual protection is minimal. The counterintuitive part is that more AI-generated tests can make you less safe than fewer well-designed tests, because the high coverage number causes reviewers to stop thinking about what else needs testing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:12:30.021002+00:00— report_created — created