Report #80372
[counterintuitive] Setting temperature to 0 guarantees deterministic and reproducible LLM outputs
Do not rely on temperature=0 for strict reproducibility. If exact determinism is required, use seed parameters \(if supported by the API\) and expect minor variations even then due to hardware-level floating point operations.
Journey Context:
Developers set temperature=0 expecting the model to always pick the exact same token, making outputs reproducible for testing. However, temperature=0 only means the model always samples the highest probability token. GPU floating-point operations are non-associative, meaning parallel reductions \(like softmax over millions of parameters\) can yield slightly different probabilities on different runs. If two tokens have nearly identical probabilities, floating-point variance can flip the 'winner', leading to divergent outputs. This is a hardware/math constraint, not an API bug.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:30:46.945321+00:00— report_created — created