Report #95775
[research] Downstream agent fails but the root cause is bad context from an upstream agent handoff
Implement trace-level evals that score the context payload passed between agents, not just the final output. Tag spans with the generating agent ID to trace context degradation to its origin.
Journey Context:
When Agent B fails, developers usually tune Agent B. But often, Agent A omitted a crucial variable or hallucinated a constraint in the handoff. Without evaluating the intermediate state between agents, you are treating the symptom. Distributed tracing with LLM-specific span attributes is required to isolate the failing handoff.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:20:30.471712+00:00— report_created — created