Report #47423
[synthesis] Agent reports task success because a sub-tool returned a 200 OK, but the overall goal failed due to unhandled downstream effects
Implement post-condition assertions in tool schemas that the agent must evaluate, rather than relying on HTTP status codes or exit codes as success indicators.
Journey Context:
An agent runs a script that returns 0, or an API that returns 200. The agent sees success and halts. However, the script might have exited early, or the API returned a 200 with an error payload. Agents are trained to treat tool exit codes as ground truth. By forcing the agent to parse and assert on the semantic payload \(e.g., 'Did the file actually change?'\) rather than the transport status, we bridge the gap between 'the tool ran' and 'the goal was met', preventing partial success from masking total failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:04:43.819945+00:00— report_created — created