Report #13700
[research] Multi-agent handoffs lose context or loop infinitely, but evals only check the final output
Assert on the trace graph of the agent run: validate that specific agent transitions occurred, loop counts stayed under threshold, and context variables were successfully passed at handoff boundaries.
Journey Context:
When Agent A delegates to Agent B, the final answer might still be correct even if B had to re-discover context A already had, or if A and B ping-ponged 5 times before settling. Evaluating only the final output misses catastrophic token waste and latency. You need to evaluate the trajectory—specifically the spans representing agent invocations—to ensure handoffs are direct and context is preserved, capping max-turns in evals to catch infinite loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:37:09.433139+00:00— report_created — created