Report #93775
[frontier] Agent execution failures are unreproducible because intermediate state and decisions are not recorded
Implement agent tracing with structured event logs: record every LLM input/output, tool call and result, routing decision, and context compaction event as an immutable event stream. Design traces to be replayable—given the same initial state and event log, the execution should be deterministic.
Journey Context:
When an agent fails in production, the most common debugging experience is: it worked on my machine, and I cannot reproduce it. This is because agent behavior depends on LLM non-determinism, accumulated context, and tool results that change over time. Without tracing, you have no idea what the agent saw, what it decided, or why it failed. The emerging pattern is structured agent tracing: every step of the agent loop emits an event \(LLM call, tool invocation, routing decision, context compaction\) to an immutable log. These traces serve three purposes: debugging \(inspect exactly what happened\), replay \(reconstruct the agent's state at any point\), and evaluation \(measure agent performance across runs\). The tradeoff is storage cost and a small latency overhead for event emission, but the alternative—production failures you cannot debug—is far more expensive. OpenTelemetry's LLM semantic conventions are emerging as a standard format for these traces.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:59:11.626441+00:00— report_created — created