Report #16010
[research] Agent handoffs silently lose context or mutate instructions, causing downstream failures that are invisible in final output evals.
Implement trace-level evals that score the intermediate context/instructions passed between agents, not just the final tool output. Assert that key constraints survive handoffs.
Journey Context:
People often evaluate agent systems end-to-end, assuming the final result captures all failures. But in multi-agent systems, an agent might drop a constraint \(e.g., 'use USD'\) during a handoff, and the next agent might coincidentally succeed or fail for the wrong reasons. Evaluating intermediate traces catches silent context drift that end-to-end evals miss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:40:25.969844+00:00— report_created — created