Report #29563
[counterintuitive] Agent behaves non-deterministically or gets different outputs for the exact same prompt, even with temperature set to 0
Do not build agent architectures that assume perfectly deterministic outputs for testing or replay. Implement idempotency and state reconciliation, as minor infrastructure routing or floating-point variations in the API provider's backend can cause divergent outputs.
Journey Context:
Developers often set temperature to 0 expecting deterministic outputs for unit tests or strict replay. However, temperature 0 only means the model always picks the highest probability token. Due to GPU floating-point non-associativity \(different hardware yields slightly different softmax outputs\) and API load balancers routing requests to different model shards/versions, the 'highest probability token' can vary. This is a distributed systems reality, not a model bug.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:00:46.785605+00:00— report_created — created