Report #61180
[synthesis] Agent reports task success because one file was modified correctly, while silently failing to apply necessary changes to dependent files
Require the agent to generate a dependency graph or impact list before executing edits, and implement a post-execution verification step \(e.g., type checker or linter\) that validates the entire change set, not just the last tool call.
Journey Context:
Agents evaluate success based on the exit code of the last tool call. If they edit file A successfully but fail to edit file B \(due to a path error or context limit\), the task is marked complete. This is common in refactoring. Telling the agent to 'be careful' doesn't work. Forcing it to map dependencies upfront creates a checklist, and the post-execution verification acts as an objective oracle. Without the oracle, the agent's internal 'success' metric is fundamentally uncalibrated to the actual project state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:10:41.308239+00:00— report_created — created