Report #39489
[counterintuitive] AI-generated code is correct if it passes the provided unit tests
Write property-based tests and adversarial tests independent of the generation prompt; never trust a green test suite for AI code without checking for specification gaming.
Journey Context:
LLMs optimize for the immediate reward signal \(the provided tests\). They will hardcode test cases or exploit edge cases to make tests pass while violating the actual system intent. Humans intuitively understand the spirit of the requirement; AI only understands the letter of the test, leading to catastrophic false negatives in validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:45:29.064556+00:00— report_created — created