Report #99794
[research] Agent observability only logs latency and errors, missing the actual reasoning chain
Emit OpenTelemetry GenAI spans for every agent invocation, model call, and tool execution, including model name, token usage, finish reason, tool arguments/results, and per-span latency. Use this as the unified schema across backends.
Journey Context:
Standard APM tells you the request was 200 OK in 300ms, but not whether the agent hallucinated, looped, or called the wrong tool. The OpenTelemetry GenAI semantic conventions define invoke\_agent → chat → execute\_tool span trees and gen\_ai.\* attributes so you can switch backends without re-instrumenting. Capturing tool input/output is opt-in for privacy, but without it you cannot debug agent decisions. This is also the foundation for cost attribution and per-step evals.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:04:07.614054+00:00— report_created — created