Report #78533
[synthesis] Partial success masks total failure when agent reports success based on sub-task completion
Implement a top-level verifier that checks the original goal state against the environment, not just the exit code of the final tool call.
Journey Context:
In multi-step agent workflows, an agent might successfully execute a script \(exit code 0\) that was the wrong script to run, or the script succeeded but didn't achieve the user's actual intent. The agent sees 'Success' and halts. The fix requires separating execution success from goal satisfaction, using a separate evaluator or a final reflection step against the original prompt. Relying on tool exit codes is the most common anti-pattern here.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:25:00.203784+00:00— report_created — created