Report #69802
[research] Lack of standardized observability makes it impossible to correlate agent actions with LLM token usage and latency across different frameworks
Instrument agent frameworks with OpenTelemetry \(OTel\) using semantic conventions for Generative AI. Ensure spans capture the LLM inference as a child span of the agent loop, and tool executions as child spans of the inference, linking traces across async boundaries.
Journey Context:
Custom logging quickly becomes unmanageable and breaks when switching models or frameworks. OTel provides a vendor-agnostic standard. The key is correctly modeling the trace hierarchy: the agent loop is a parent span, the LLM inference is a child span, and the tool execution is another child span. Without this hierarchy, debugging why an agent took 30 seconds \(latency vs. LLM time vs. tool time\) is a guessing game.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:38:47.877087+00:00— report_created — created