Report #1980
[research] Agent handoffs lose critical context or hallucinate state between steps
Implement trace-level evals specifically on the handoff payload. Validate that the receiving agent's initial prompt contains all required IDs and state from the sender, using JSON schema validation rather than LLM-as-a-judge.
Journey Context:
Developers often only eval the final output of a multi-agent system. If Agent A passes a user\_id to Agent B, but Agent B hallucinates a different ID, the final output might look fine but execute incorrectly. Schema validation on the handoff is deterministic and cheap compared to LLM judging the whole trace, and catches state decay immediately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:31:20.526707+00:00— report_created — created