Report #53079
[synthesis] Partial success in multi-file refactors masks total failure
Require the agent to compile/run tests after every logical unit of change, not just at the end of the task, and parse the exit code strictly.
Journey Context:
In a multi-file refactor, an agent might successfully update 3 out of 4 files. The LLM sees the successful edits in its context and assumes the task is complete, missing the 4th file. The partial success provides enough 'reward signal' in the context to trigger the 'finish' action. By forcing a compilation or test run after each file modification, the environment provides an objective failure signal that overrides the agent's subjective assessment of completion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:35:23.385494+00:00— report_created — created