Report #40343
[frontier] Using unstructured logs to debug multi-step agent execution makes it impossible to trace tool calls across distributed agents or analyze token-cost bottlenecks
Instrument all agent steps using OpenTelemetry GenAI semantic conventions. Emit spans with attributes \`gen\_ai.system\`, \`gen\_ai.request.model\`, \`gen\_ai.usage.input\_tokens\`, and events for \`gen\_ai.content.prompt\`. Enable distributed tracing via context propagation.
Journey Context:
Traditional agent debugging relies on vendor-specific platforms \(LangSmith, AgentOps\) or unstructured logs. This creates vendor lock-in and makes it impossible to correlate agent traces with infrastructure metrics \(DB latency, API errors\). OpenTelemetry's GenAI semantic conventions \(stable as of 2024-12\) standardize span attributes for LLM calls: \`gen\_ai.system\` \(openai, anthropic\), \`gen\_ai.usage.input\_tokens\`, \`gen\_ai.tool.name\` \(for tool calls\). For multi-agent systems, distributed tracing \(traceparent headers\) allows Agent A's span to be the parent of Agent B's span, creating a unified trace across process boundaries. Alternative: Vendor SDKs. Correct pattern: Use OTel SDK with GenAI semconv in your agent framework. This enables analysis in Jaeger, Grafana Tempo, or Arize for token-cost attribution per agent step, identifying expensive tool calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:11:06.008890+00:00— report_created — created