Report #96368
[counterintuitive] Are AI-generated tests with high coverage effective tests?
Evaluate AI-generated tests by invariant coverage, not line coverage. Manually verify that tests assert meaningful postconditions \(state transitions, invariant preservation, boundary conditions\) rather than just exercising code paths. Use AI to generate test cases for pure functions and edge conditions, but write state and integration tests manually. Apply mutation testing to validate test quality.
Journey Context:
AI-generated tests frequently achieve 80%\+ line coverage while testing almost nothing meaningful. The pattern: AI generates tests that call functions with typical inputs and assert the output equals what the function returns with those inputs \(circular validation\), or tests that exercise code paths but only assert no-exception-thrown. This creates a dangerous coverage illusion — the coverage report looks green, giving false confidence. The root cause is that AI optimizes for the measurable metric \(coverage\) rather than the real goal \(invariant verification\). AI is genuinely excellent at generating exhaustive edge-case inputs for pure functions \(boundary values, empty inputs, maximum sizes\) — this is a mechanical task where AI pattern knowledge excels. But testing stateful behavior requires understanding what invariants must hold across operations, which is exactly the system-level reasoning AI lacks. Martin Fowler's canonical guidance is clear: test coverage is a useful tool for finding untested parts of a codebase but is of little use as a numeric statement of how good your tests are. The practical approach: let AI generate the parametric test scaffolding and edge-case inputs, then manually specify the assertions that verify invariants.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:20:14.388360+00:00— report_created — created