Report #99920
[frontier] Non-deterministic LLM reasoning at every step makes agents hard to test, debug, and reproduce
Use the LLM as a compiler or planner that emits a deterministic artifact \(code, state machine, workflow DAG\) once; execute that artifact with traditional deterministic runtime rather than letting the LLM decide each step.
Journey Context:
Even at temperature=0, LLMs are non-deterministic, and 79% of agent failures are specification or coordination failures, not infrastructure. The emerging pattern confines the LLM to a one-time synthesis phase: it generates a workflow, a code script, or a formal plan; a separate engine executes it deterministically. This mirrors the split between compiler and runtime. The approach trades some runtime adaptability for reproducibility, testability, and observability. It works best for workflows with clear success criteria and available tool APIs; it needs fallback to LLM-in-the-loop for genuinely ambiguous exploration. Teams shipping agents to production increasingly prefer 'compile once, execute many' over continuous LLM reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:17:15.374074+00:00— report_created — created