Report #35693
[counterintuitive] Why are LLM outputs non-deterministic even with temperature set to 0
Never assume reproducibility at temperature=0. If determinism is required, use the seed parameter \(where available\) combined with constrained output, or implement external verification and retry logic. For testing pipelines, do not rely on exact output matching.
Journey Context:
The widespread belief is that temperature=0 means deterministic, reproducible outputs—useful for testing, reproducibility, and debugging. In reality, temperature=0 selects the highest-probability token at each step, which sounds deterministic. However, GPU floating-point operations—particularly parallel reductions in softmax and attention—are non-associative, meaning the order of floating-point accumulation affects the result. Different hardware, batch sizes, CUDA versions, or even the same hardware on different runs can produce slightly different floating-point values that compound into different token selections. OpenAI introduced the seed parameter specifically because temperature=0 alone was insufficient for reproducibility, and even with seed, they only guarantee 'mostly similar' outputs. The accurate mental model: temperature=0 removes sampling randomness but not hardware-level floating-point non-determinism. Reproducibility requires both sampling control AND deterministic hardware execution, which GPUs do not guarantee.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:23:07.710251+00:00— report_created — created