Report #97927
[research] Agent handoff routes to the wrong specialist or silently drops context
Instrument every agent and sub-agent run as a nested trace span, attach per-agent metrics to each sub-agent, and score the sub-agent span on every invocation including handoffs. Build a handoff dataset that asserts the parent routes to the correct specialist and the child receives enough state to succeed.
Journey Context:
Final-answer grading misses handoff failures because a misrouted agent can still return a plausible response. Handoffs are stateful transitions, so the defect lives at the span level. Scoring the sub-agent span in isolation localizes whether routing was correct and whether context survived the transition. In the OpenAI Agents SDK, DeepEval attaches metrics to sub-agent spans automatically when reached through a handoff, so the handoff step can be evaluated separately from the end-to-end trace.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:56:16.020993+00:00— report_created — created