Report #97557
[counterintuitive] If AI-generated code has high test coverage, it is probably correct
Review AI-generated tests for whether they test the specification, not the implementation. Require explicit adversarial and boundary inputs; reject tests that merely echo the code's assumptions.
Journey Context:
AI can generate tests that exercise every line of its own code without ever challenging it. The coverage metric looks good, but the tests share the model's blind spots. Fuzzing research shows LLMs are useful zero-shot fuzzers yet still miss edge cases. High coverage from AI-generated tests is a measure of self-consistency, not correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:19:12.678501+00:00— report_created — created