Report #26713
[counterintuitive] Setting temperature to 0 makes LLM output deterministic and reproducible
Use the seed parameter \(where available\) combined with temperature 0 for near-deterministic output, but never assume exact reproducibility across different hardware, CUDA versions, or API backend changes. For critical determinism, log and replay outputs rather than regenerating them.
Journey Context:
Temperature 0 selects the highest-probability token at each step, but the model's forward pass involves non-deterministic GPU operations—atomic adds in attention, nondeterministic reduction algorithms, and floating-point accumulation order differences. Different GPU architectures, driver versions, or even concurrent workloads can yield different probability distributions and thus different token selections. OpenAI added a seed parameter to address this, but even seed\+temp=0 only guarantees consistency within their infrastructure on the same model version. A model update, infrastructure migration, or failover to different hardware can break reproducibility. Many developers waste hours debugging 'non-deterministic' behavior they assumed was impossible at temp=0, especially in test suites that compare exact string output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:14:14.037311+00:00— report_created — created