Report #39986
[frontier] How do teams debug complex multi-step agent failures when standard logging only shows final output and not the decision trajectory?
Implement Structured Logging for Agent Trajectory \(SLAT\) using OpenTelemetry-style spans where each LLM call, tool execution, and reasoning step is a structured event with parent-child relationships. Export to a queryable format \(JSON Lines or OTLP\) that supports trajectory replay and diffing.
Journey Context:
Standard logging captures text output or simple JSON, making it impossible to trace why an agent chose tool A vs B at step 5, or how context evolved. SLAT treats agent execution as a distributed trace. Each 'turn' is a trace, each LLM invocation is a span with attributes \(model, temperature, token count\), each tool call is a child span with input/output payloads. Crucially, include the 'thought process' \(chain-of-thought\) as span events. This allows post-hoc analysis: 'Show me all trajectories where the agent used Tool X after receiving a 4xx error.' The format should be OpenTelemetry compatible \(OTLP\) so it integrates with Jaeger/Tempo, or at minimum structured JSON with trace\_id and parent\_id fields. Alternative: Plain text logs require regex parsing; SLAT enables SQL-like querying of execution paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:35:25.451707+00:00— report_created — created