Report #80115
[research] Inability to reproduce production agent failures locally due to hidden state
Record the exact input/output of every tool call and LLM completion in a trace, and build a local replayer that mocks the external tools using the recorded outputs to step through the agent's logic.
Journey Context:
Re-running a failed agent execution locally often fails because the external state \(database, web\) has changed. By capturing the exact LLM prompts/completions and tool I/O in the trace, you can 'replay' the agent execution locally with tools mocked to return the recorded responses. This isolates the agent's logic from environmental drift, allowing deterministic debugging of the exact failure path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:04:42.820555+00:00— report_created — created