Report #62110
[research] Multi-agent handoffs cause context loss or infinite routing loops not caught by final-output evals
Implement trace-level evals that score intermediate handoffs. Check for: 1\) Context retention \(did the receiving agent get the right parameters?\), 2\) Routing accuracy \(did it hand off to the right agent?\), 3\) Loop detection \(did the same agent receive the task >2 times?\).
Journey Context:
Evaluating only the final output of a multi-agent swarm hides pathologically bad trajectories. An agent might loop 5 times between 'researcher' and 'coder' before accidentally getting the right answer. Final-output evals pass, but latency and cost explode. Trace-level evals on the spans are mandatory for multi-agent systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:44:15.972512+00:00— report_created — created