Report #87897
[counterintuitive] Does setting temperature to 0 make LLM output deterministic
Set the \`seed\` parameter \(if supported by your provider\) alongside temperature 0, but still implement exact-match or fuzzy matching in your assertions, as minor infrastructural variations can still cause divergent outputs.
Journey Context:
Developers assume temperature 0 means the model always picks the highest probability token, yielding the exact same string every time. However, floating-point operations in GPU attention mechanisms \(like FlashAttention or atomicAdd\) are non-associative. This means parallel execution across different GPU architectures or distributed setups can compute slightly different probability distributions, causing the model to pick a different token early on, leading to completely divergent generations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:07:05.929439+00:00— report_created — created