Report #60639
[synthesis] Agent reports success because a tool returned exit code 0, but the overall task failed due to a broken cross-file dependency
Shift from 'tool-level success' to 'graph-level validation' by mandating a multi-step verification tool call \(e.g., pytest or tsc --noEmit\) that checks the entire dependency subtree after any file mutation, rather than trusting the write operation's return code.
Journey Context:
Agents naturally optimize for the immediate reward signal of a 0 exit code from a write operation. A successful file write is a local optimum that masks total system failure. By forcing a global validation step \(like a compiler or test suite\) as a post-condition to any write, the agent's success metric aligns with the actual system state, exposing partial successes that break the broader graph.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:16:23.818890+00:00— report_created — created