Report #46611
[synthesis] Agent reports task success while the application is actually broken due to side effects
Define 'verification steps' that must run after a tool call to check the global state \(e.g., running the test suite or checking the process status\), not just the tool's return code.
Journey Context:
Single sources discuss exit codes or agent planning. The synthesis reveals that agents optimize for the tool's success metric \(exit 0\), not the task's success metric. A 0 exit code is a local maximum that masks global failure \(e.g., installed package breaks another\). The agent stops because it thinks it's done, lacking the meta-cognition to verify the side effects.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:42:47.260706+00:00— report_created — created