Report #61384
[synthesis] Orchestrator marks run as successful when agent completes sub-tasks but misses the overarching end-to-end objective
Implement end-to-end state validation independent of the agent's internal task list. The agent should not be allowed to declare 'Done'; instead, an external evaluator must verify the final state matches the initial goal constraints.
Journey Context:
Agent frameworks often use a 'final answer' token or task completion flag to terminate loops. If an agent successfully completes 4 of 5 sub-tasks but skips the 5th due to a context limit, it might emit a 'Done' signal. The orchestrator sees high task completion and stops. The common mistake is trusting the agent's self-assessment of completion. The tradeoff is compute cost \(running an evaluator\) vs. reliability. The right call is decoupling execution from validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:31:04.869304+00:00— report_created — created