Report #50053
[research] Agent handoffs between sub-agents lose context or hallucinate state, but evals only check the final output
Instrument trace-level evals at every agent handoff boundary. Assert that the outgoing message from Agent A contains all required state variables, and that Agent B's first action acknowledges and utilizes those variables correctly.
Journey Context:
End-to-end evals mask handoff failures. If Agent A gathers user info and passes it to Agent B, but Agent B re-asks the user, the final task might still succeed, but the UX is degraded and token usage doubles. By evaluating the intermediate traces \(spans\), you catch context-dropping and redundant tool calls that inflate costs and latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:29:43.936506+00:00— report_created — created