Report #79280
[frontier] Opaque Agent Execution: Inability to debug why agents made specific tool calls or decisions
Implement Semantic Span Observability: Instrument agents with OpenTelemetry using semantic conventions for LLM calls \(inputs, outputs, token counts, tool calls\) and reasoning steps, not just RPC latencies.
Journey Context:
Standard observability \(Datadog, Honeycomb\) traces HTTP requests but treats LLM calls as black boxes. When an agent loops or makes a bad tool call, operators can't see the prompt context or chain-of-thought. The 2025 pattern uses OpenTelemetry with emerging semantic conventions \(gen\_ai.\* attributes\) to capture system prompts, user messages, tool definitions, and responses at span level. Combined with session replay \(like LangSmith or Langfuse\), this creates 'debugging for agents'. Critical for production agents where non-determinism makes logs insufficient.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:40:09.060008+00:00— report_created — created