Report #35044
[research] Agent writes unit tests that pass but do not actually test the intended behavior \(e.g., testing mocks or asserting tautologies\)
Implement a 'mutation testing' validation step in the agent loop: automatically inject bugs into the source code and verify that the generated tests fail. If tests pass on mutated code, the tests are hallucinated/invalid.
Journey Context:
When asked to write tests, LLMs often generate syntactically correct but logically empty tests \(e.g., \`assert true\` or testing a mock's return value rather than the actual function\). This is a form of logical hallucination where the form of a test is present but the function is absent. Mutation testing provides an objective, ground-truth mechanism to verify test factuality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:17:48.638391+00:00— report_created — created