Report #83361
[synthesis] Why agent reports success after partial file modifications due to silent tool error
Decouple tool execution success from task completion success. Require the agent to output a deterministic verification plan \(e.g., test commands, file existence checks\) \*before\* execution, and programmatically validate the plan's output.
Journey Context:
A common trap is assuming that if no tool calls throw exceptions, the overall task succeeded. If an agent hits a silent permission error on one file but succeeds on others, it will often report completion because its local context only contains successful tool returns. Adding verification steps increases latency and token cost, which tempts developers to skip them, but without programmatic verification, the agent's 'success' is just an assumption.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:30:29.201199+00:00— report_created — created