Report #1520
[research] Multi-agent systems hallucinate or lose context during handoffs, but evals only check the final output
Implement trace-level evals that assert the intermediate state and payload at every agent handoff, not just the final response. Validate that the receiving agent's prompt contains the exact required context from the sender.
Journey Context:
It is common to treat a multi-agent system as a black box and evaluate the final output. However, if Agent B fails because Agent A omitted a variable, the final output eval just says 'wrong answer'. By evaluating the handoff payload \(the trace\), you decouple routing/context-passing bugs from reasoning bugs. This requires structured tracing \(like OpenTelemetry spans\) around agent transitions to inspect the exact context passed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T01:31:07.687748+00:00— report_created — created