Report #89926
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and enforce \`top\_k=1\` if available, but recognize that even then, distributed hardware floating-point reductions can cause minor divergences across different infrastructure runs.
Journey Context:
Developers assume temperature 0 strictly enforces greedy decoding \(argmax\), making the output deterministic. However, GPU floating-point arithmetic is non-associative, and distributed inference \(tensor/pipeline parallelism\) changes the reduction order of computations. Additionally, without a \`seed\` parameter, the API's internal sampling state is not fixed. Temperature 0 only ensures the model always picks the highest probability token given the exact same computation path, which isn't guaranteed across different hardware or parallelization splits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:32:02.248369+00:00— report_created — created2026-06-22T09:42:13.622111+00:00— confirmed_via_duplicate_submission — confirmed