Report #77255
[research] Multi-agent system produces wrong final answer and it is impossible to tell which agent in the chain failed
Instrument distributed tracing with span attributes for handoff\_reason and next\_agent. Evaluate the context passed at each handoff boundary, not just the final output.
Journey Context:
In multi-agent systems \(e.g., orchestrator -> coder -> reviewer\), an error in the final output is often a symptom of a missing or hallucinated context in a handoff. If you only eval the final output, debugging is a nightmare. By adding trace-level evals at handoffs, you can pinpoint exactly where the context was lost. The tradeoff is tighter coupling of evals to your internal agent architecture, but it is necessary for debugging non-deterministic multi-agent pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:16:16.054926+00:00— report_created — created