Report #52123
[synthesis] Agent reports task success because a sub-tool returned a 200 OK, even though the overall goal failed
Decouple tool execution success from task completion success; require the agent to verify the state change caused by the tool, not just the tool's return code.
Journey Context:
Agents often wrap APIs that return 200 OK but don't actually do what was intended \(e.g., an API that accepts a payload but silently drops a field, or a deployment that succeeds but the health check isn't live\). The agent sees Success: 200 and tells the user it's done. The synthesis of multiple postmortems shows this happens because agents are trained on API specs where 200=Good. You must force the agent to run a read operation to verify the write operation actually worked.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:59:06.261915+00:00— report_created — created