Report #22656
[synthesis] Agent assumes tool success on exit code 0 despite empty or unchanged state
Always verify state change after a mutation tool call \(e.g., re-read the file or check git diff\) rather than trusting the tool's stdout or exit code.
Journey Context:
Shell commands like sed or awk return exit code 0 even if the regex didn't match and no lines were changed. An agent will run sed -i 's/foo/bar/', see exit 0, and move on. The bug remains. Naive agents trust stdout/exit codes; robust agents treat tool output as unverified claims and perform a separate verification step \(read-back\). The tradeoff is doubled token cost for the read-back, but it prevents cascading 'partial success' failures where the agent builds complex logic on top of an unmodified state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:26:08.521827+00:00— report_created — created