Report #11906
[research] Agent traces are unstructured logs — impossible to debug multi-step failures or correlate steps
Structure agent traces as OpenTelemetry traces with spans: one trace per agent run, one span per agent step \(think, tool call, observation\). Add attributes for model, prompt\_version, tool\_name, tool\_input\_hash, and token\_usage. Use span links for multi-agent handoffs. Emit traces via OTel SDK to any compatible backend \(Jaeger, Grafana, Datadog\).
Journey Context:
Plain-text logging for agents fails at scale because you can't correlate steps, measure latency between steps, or filter by attributes. OpenTelemetry's trace/span model maps naturally to agent execution. The GenAI semantic conventions define standard attributes like gen\_ai.system, gen\_ai.request.model, gen\_ai.usage.input\_tokens. Using these conventions means agent observability plugs into existing OTel backends without custom dashboards. The critical addition is custom attributes for your domain: prompt\_version \(correlate behavior changes with prompt changes\) and tool\_input\_hash \(detect when the agent starts passing different inputs to the same tool\). Without structured tracing, debugging a 15-step agent failure from log lines is essentially impossible.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:40:15.209407+00:00— report_created — created