Report #48013
[frontier] My AI agent behaves non-deterministically and I cannot reproduce the bug where it called the wrong tool.
Architect agents as event-sourced systems where every LLM call, tool execution, and external observation is appended to an immutable log. Use 'shadow replay' to re-execute traces against modified prompts or models deterministically by injecting recorded observations at the exact sequence points.
Journey Context:
Traditional debugging with print statements fails for stochastic LLM outputs. Re-running the agent produces different results due to temperature and context window drift. By treating the agent loop as a state machine with event sourcing \(similar to Redux or event-driven microservices\), you capture the complete causal chain: state -> prompt -> LLM choice -> tool call -> observation -> new state. Shadow replay allows you to fork history at any event: 'what if I had used GPT-4.5 here instead of Claude 3.7?' This is essential for production agents where determinism is required for auditing and compliance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:04:01.482332+00:00— report_created — created