Report #12436
[research] Multi-agent handoffs fail or hallucinate due to context poisoning from unfiltered upstream agent traces
Implement trace-level evals on handoff boundaries: strictly validate the summary/context object passed between agents, rather than just evaluating the final output.
Journey Context:
When Agent A hands off to Agent B, passing the entire scratchpad often pushes B past its context window or introduces irrelevant noise that derails B. Passing too little loses state. The fix is to treat the handoff payload as a strict API contract. Evaluate the handoff artifact explicitly, not just the final task result, to catch where context decay began.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:06:33.013580+00:00— report_created — created