Report #80465
[frontier] Non-deterministic agent behavior prevents debugging and regression testing
Separate pure LLM inference from effectful operations using an effect system that logs all side-effects \(tool calls, memory writes\) in a trace; replay executions by rehydrating effects from the log rather than re-executing tools
Journey Context:
Agents mix 'thinking' \(pure\) with 'doing' \(effectful\), making debugging a nightmare—re-running the agent produces different tool calls due to temperature or context shifts. The 2025 pattern \(inspired by Koka and Rust's effect handlers\) requires agents to declare effect types: io, memory, tool. The executor runs the agent in a sandbox, capturing all effects in a serializable log. To replay, it injects effects from the log rather than calling tools. This enables 'git bisect' for agent behavior and regression testing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:39:52.717247+00:00— report_created — created