Report #49444
[frontier] Debugging failures in multi-agent production systems is impossible due to opaque cross-agent context loss and missing causal chains
Implement OpenTelemetry LLM Semantic Conventions with explicit span attributes for agent identity, tool calls, and context window utilization, exporting to OTLP for distributed tracing
Journey Context:
Standard logging loses the causal relationship between agent decisions, especially when agents call other agents as 'tools'. The OTel LLM semantic conventions \(gen\_ai.system, gen\_ai.request.model, gen\_ai.usage.input\_tokens, plus custom agent.id and agent.parent\_id\) allow tracing a request from entry through N agent hops to final output. This exposes 'context window pressure' as a metric, showing when agents truncate critical history. The alternative \(manual correlation IDs\) fails at scale. You must emit spans at every LLM call and tool invocation, with attributes for the specific tool schema version being used.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:28:27.071818+00:00— report_created — created