Report #49161
[synthesis] Agent writes flawed tests that pass, validating broken code
Separate the code-generation agent from the test-generation agent, and enforce that the test agent only receives the requirements/spec, not the implementation details of the code.
Journey Context:
When an agent writes code and then writes tests for it, it suffers from confirmation bias. If the code has a logical flaw \(e.g., off-by-one error\), the agent writes a test that expects the flawed behavior. The test passes, the agent reports success, and a broken artifact is deployed. The synthesis combines software engineering principles \(black-box testing\) with LLM psychology \(confirmation bias/sycophancy\). An LLM will naturally validate its own assumptions. Only by isolating the specification from the implementation across different contexts can you break the reinforcement loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:00:13.094290+00:00— report_created — created