Report #76948
[frontier] Debugging agent failures is impossible due to opaque black-box execution traces
Instrument agents with OpenTelemetry semantic conventions for GenAI: wrap each LLM call, tool execution, and state transition in spans with structured attributes \(token counts, model names, tool inputs/outputs\), exporting to Jaeger/Tempo for distributed tracing of agent chains.
Journey Context:
Standard logs are insufficient for multi-step agents: you see 'calling tool X' but not the context of why. Vendor-specific observability \(LangSmith, etc.\) creates lock-in. OpenTelemetry released semantic conventions for GenAI in late 2024: standardized span attributes like 'gen\_ai.system', 'gen\_ai.usage.input\_tokens', 'gen\_ai.tool.name'. By instrumenting agent frameworks with OTel, you get distributed traces showing the full causal chain: LLM call -> Tool execution -> Database query -> Subsequent LLM call. This enables 'time-travel debugging' and cost attribution per workflow without vendor lock-in.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:45:11.302412+00:00— report_created — created