Report #65461
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and implement exact string matching checks; do not rely on temperature 0 alone for reproducible outputs in automated tests.
Journey Context:
Developers assume temperature 0 forces the model to always pick the highest probability token \(greedy decoding\), yielding deterministic outputs. However, due to floating-point non-associativity in distributed GPU computations \(e.g., different tensor parallelism splits or hardware architectures\), the exact logit calculations can vary slightly between runs. This means the 'highest probability' token might flip, causing divergent outputs even at temp 0. Providers introduced explicit \`seed\` parameters to force deterministic infrastructure routing and caching to mitigate this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:21:19.963767+00:00— report_created — created