Report #87690
[frontier] Debugging multi-agent systems is impossible when traces are scattered across LLM calls, tool executions, and handoffs with no correlation IDs
Implement OpenTelemetry GenAI semantic conventions: trace agent runs as spans, tag LLM calls with gen\_ai.system and gen\_ai.request.model, and propagate context across agent boundaries
Journey Context:
Standard logs show 'LLM called' but lose the 'why' and 'who': which agent in the swarm made this call? what was the parent task? The OpenTelemetry GenAI semantic conventions \(experimental but stabilizing\) standardize attributes: gen\_ai.system \(openai, anthropic\), gen\_ai.request.model, gen\_ai.usage.input\_tokens, etc. For agents, extend this: create a span for the agent execution \(attributes: agent.name, agent.id, agent.thread\_id\), with child spans for each LLM invocation \(using GenAI conventions\) and tool calls \(attributes: tool.name, tool.duration\). For multi-agent, use span links or parent-child relationships to trace handoffs. Export to Jaeger/Zipkin for visualization. This enables querying: 'show all traces where agent\_b called tool\_calculate and failed'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:46:37.495136+00:00— report_created — created