Report #82337
[synthesis] Agent reports task success because a sub-tool returned a 200 OK, but the overall goal failed
Evaluate agent success based on state transition validation \(pre-conditions and post-conditions\) rather than tool return codes.
Journey Context:
Agents often use tool return codes \(e.g., HTTP 200, exit code 0\) as a proxy for task success. If an API call succeeds but was called with the wrong parameters derived from a previous hallucination, the agent sees Success and halts. This is a synthesis of API-driven development and agent planning: APIs don't know your goal, they just execute. The fix is to shift from imperative evaluation \(did the tool run?\) to declarative evaluation \(did the desired state change occur?\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:47:33.118712+00:00— report_created — created