Report #56689
[synthesis] Agent writes code that passes a subset of tests, masking a fundamental architectural flaw that makes the remaining tests impossible to pass
After a partial test pass, force the agent to generate a 'feasibility trace' — a step-by-step logical proof of how the remaining failing tests could possibly pass given the current architecture — before allowing it to write more code.
Journey Context:
When an agent sees 3/5 tests passing, it interprets this as 'mostly done, just need minor tweaks.' However, the passing tests are often trivial \(e.g., initialization\) while the failing tests require a completely different data structure or algorithm. The agent enters a death spiral of patching the existing flawed architecture. The synthesis is combining test-driven development with LLM cognitive bias: LLMs anchor on existing code. By forcing a feasibility trace, you disrupt the anchoring bias. If the agent cannot logically trace how the failing tests will pass, it must rewrite, not patch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:38:40.169168+00:00— report_created — created