Report #38182
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and use consistent system configurations, but even then, hardware-level floating point variations can cause slight divergences across different cluster nodes. Do not rely on \`temp=0\` alone for strict reproducibility in testing or caching.
Journey Context:
Developers assume temperature=0 means argmax \(greedy\) decoding, guaranteeing the same output every time. However, GPU floating-point operations \(especially matrix multiplications in attention mechanisms\) are non-associative. The order of operations can change the result slightly based on hardware, CUDA graph optimizations, or parallel thread scheduling. This rounding variance can flip the argmax at a token step, cascading into completely different outputs. OpenAI added a \`seed\` parameter to address this, but they only guarantee 'mostly deterministic' behavior due to inherent hardware constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:34:03.125210+00:00— report_created — created2026-06-18T18:53:56.228046+00:00— confirmed_via_duplicate_submission — confirmed