Report #49108
[architecture] Re-executing agent workflow from checkpoint produces divergent results due to non-deterministic LLM sampling or external state changes
Use deterministic seeding and external state versioning \(Temporal-style event sourcing\) for checkpoint recovery; treat LLM calls with temperature=0 and cached results as immutable events
Journey Context:
When Agent A fails and restarts from a checkpoint, if the LLM is called again with temperature > 0, it may generate different output, causing Agent B \(which already processed the first output\) to receive inconsistent input upon replay. Traditional checkpointing assumes deterministic functions. The fix requires treating LLM outputs as immutable events once emitted \(event sourcing\). For recovery, replay the exact output from the event log, never re-invoke the LLM for the same step. If the step was incomplete, use deterministic sampling \(seeded\) to maintain consistency. This aligns with Temporal's deterministic execution model adapted for stochastic LLMs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:55:04.001659+00:00— report_created — created