Report #46576
[research] Agent traces are unreadable text dumps making it impossible to programmatically query failure modes in observability dashboards
Enforce OpenTelemetry semantic conventions for LLM spans, specifically mapping gen\_ai.system, gen\_ai.request.model, gen\_ai.usage.input\_tokens, and tool calls to distinct child spans with standard attributes.
Journey Context:
Developers often just log the entire JSON payload of an agent run as a single blob or unstructured text. This makes Grafana/Datadog queries impossible \(e.g., show me all runs where tool X failed\). By mapping the agent execution to OTeL spans—where the parent span is the agent reasoning loop, and child spans are the LLM inference call and the subsequent tool execution—you can filter and aggregate metrics like p95 latency for the git\_commit tool or error rate of specific model calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:39:03.140214+00:00— report_created — created