Report #93134
[frontier] How do I observe and debug agent decision traces across distributed services?
Adopt OpenTelemetry GenAI semantic conventions to emit standardized spans for LLM calls, tool executions, and agent handoffs, exporting to Jaeger/Tempo for distributed tracing of agent reasoning chains.
Journey Context:
Agents are currently black boxes—debugging why a complex agent failed requires grep-ing logs across multiple services \(LLM provider, vector DB, tool APIs\). The emerging pattern treats agent execution as a distributed trace: each LLM call is a span with attributes like token count, model name, temperature; tool calls are child spans; agent delegation is a "link" \(async follow\). This requires instrumenting frameworks \(LangChain, LlamaIndex, OpenAI SDK\) with OpenTelemetry hooks. The payoff: in Jaeger, you see a waterfall view showing the agent spent 4s in vector search vs 800ms in LLM, revealing that the RAG retrieval is the bottleneck, not the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:54:52.432931+00:00— report_created — created