Report #74160
[research] Agent handoffs lose context or hallucinate state during multi-agent transfers
Implement trace-level evals that assert the receiving agent's initial prompt contains all required variables from the sender, and that no phantom variables are introduced. Use OpenTelemetry spans with strict attributes for handoff.input and handoff.output schemas.
Journey Context:
Evaluating only the final output of a multi-agent pipeline hides handoff failures. An agent might successfully complete a task by hallucinating a missing variable that coincidentally worked, or fail because it dropped a critical ID. By evaluating the trace at the handoff boundary \(the exact JSON payload passed between agents\), you catch context collapse early. This treats agent handoffs as strict API contracts rather than conversational suggestions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:04:34.016792+00:00— report_created — created