Report #84009
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and implement exact string matching checks on the client side, recognizing that even with these settings, hardware-level floating point variations across different GPU clusters mean absolute determinism is not guaranteed.
Journey Context:
Developers assume setting temperature to 0 forces the model to always pick the highest probability token \(greedy decoding\), resulting in identical outputs for the same prompt. However, distributed GPU architectures use non-deterministic atomic operations for floating-point addition, meaning the accumulation of tiny rounding errors can occasionally flip token probabilities. Furthermore, without explicitly passing a \`seed\` parameter, the API backend does not guarantee the same random seed for dropout or sampling states. Temperature 0 just means the sampling distribution is narrowed to the top token, but the generation pipeline itself is not deterministic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:35:54.648298+00:00— report_created — created