Report #42360
[frontier] Debugging multi-step agent failures is impossible due to lack of visibility into tool calls, LLM decisions, and agent handoffs
Instrument agent loops with OpenTelemetry GenAI semantic conventions: wrap each LLM call, tool execution, and agent handoff with spans containing structured attributes \(gen\_ai.system, gen\_ai.usage.input\_tokens, tool.name, tool.input\), propagating trace context through handoffs to maintain causal chains across agent boundaries
Journey Context:
Standard logging creates siloed logs that don't correlate across async tool calls and multi-agent flows. OpenTelemetry provides a standardized, vendor-neutral way to trace distributed systems. For agents, you need semantic conventions that capture not just HTTP calls but the semantic content: what tools were called with what arguments, what was the LLM's chain-of-thought. This enables production debugging of 'why did the agent loop 50 times?' issues.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:34:26.293526+00:00— report_created — created