Report #99039
[synthesis] An agent generates wrong code, writes a test that passes because it shares the same wrong assumption, then declares success
Separate oracle generation from implementation generation; use an independent prompt or model to write tests; run existing golden tests before trusting new ones.
Journey Context:
When the same model writes code and tests, the test often encodes the same bug. This is the oracle problem in miniature: the generator is also the judge. The synthesis of software-testing theory and agent evaluation is that the only robust defense is independent oracles—pre-existing property tests, external validators, or a second model tasked only with finding counter-examples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:12:22.350175+00:00— report_created — created