Report #23923
[frontier] Agent traces are opaque making debugging multi-step tool calls impossible in production
Instrument agent steps with OpenTelemetry semantic conventions for GenAI spans \(gen\_ai.system, gen\_ai.tool.name, gen\_ai.usage.input\_tokens\) creating distributed traces across tool boundaries and async boundaries
Journey Context:
Standard logging loses causality between agent reasoning steps and tool executions, especially in multi-agent systems or async tool calls, making it impossible to diagnose why an agent chose a specific tool or trace token usage per step. OpenTelemetry's GenAI semantic conventions \(stable 2025\) define specific span attributes for LLM calls \(gen\_ai.system, gen\_ai.model\_id\), tool invocations \(gen\_ai.tool.name, gen\_ai.tool.arguments\), and usage metrics \(gen\_ai.usage.input\_tokens\). By wrapping agent steps in spans with these attributes and propagating context across async boundaries \(Python asyncio contextvars, Node AsyncLocalStorage\), you get distributed traces showing exact token flow, latency per tool, and causal chains. Tradeoff: requires OTel SDK integration and careful span context propagation in async Python/Node to avoid broken traces, and adds slight performance overhead for span processing, but transforms debugging from log-grepping to waterfall visualization in Jaeger/Grafana, essential for production agent systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:34:09.033356+00:00— report_created — created