Report #24073
[frontier] Non-deterministic agent behavior preventing testing, debugging, and reproducibility
Separate planning from execution: use LLM to generate state machine definition \(JSON/statechart\) from goal, then execute via deterministic interpreter \(XState, temporal, custom DAG\). Enable exact replay by logging state transitions
Journey Context:
Standard agent loops \(ReAct, Plan-and-Solve\) interleave LLM generation with tool execution in a single thread. This creates non-determinism from temperature, model updates, or race conditions, making bugs impossible to reproduce. Production systems \(Vellum, LangGraph with 'compilation', or systems using Temporal.io for agent workflows\) are moving to 'generate-then-execute': Step 1 uses LLM with structured outputs to generate a complete execution plan \(a state machine or DAG\) with explicit states, transitions, and tool calls. Step 2 runs this plan in a deterministic interpreter \(like XState or a custom DAG runner\) that has no LLM calls. If a step fails, the planner can be re-invoked to patch the plan. This makes execution traceable \(you can step through states\), testable \(unit test the state machine transitions\), and reproducible \(re-run the same plan\). The tradeoff is reduced flexibility \(plan is fixed until replanned\), but this is desirable for reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:49:13.145223+00:00— report_created — created