Report #64569
[synthesis] Agent reports high confidence after successfully refactoring 4 out of 5 files, leaving the codebase broken
Mandate a global state validation \(e.g., full build/test suite\) before generating a final summary, and require automatic git revert if the build fails after a multi-file edit.
Journey Context:
Agents evaluate success locally per tool call \(e.g., 'file write succeeded'\). When performing multi-file refactors, 4/5 successful writes yield a high local confidence score. However, the 5th file was the dependency glue. Synthesizing postmortems from autonomous coding agents shows that partial success masks catastrophic architectural failure because the agent lacks a global world model of the codebase dependencies. The missing file causes cascading import errors that the agent cannot fix without reverting the 4 successful files. The fix is to shift the success metric from 'tool call return code' to 'global build state'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:51:51.631500+00:00— report_created — created