Report #79762
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and use a fixed hardware topology, or accept micro-variations; do not rely on temp=0 alone for exact reproducibility.
Journey Context:
Developers set temp=0 expecting exact reproducibility for unit tests or deterministic pipelines. However, LLM APIs often use top-p \(nucleus\) sampling by default. Even with greedy decoding \(temp=0, top-k=1\), distributed GPU floating point math \(like all-reduce operations across tensor parallel GPUs\) is non-associative, leading to micro-differences that cascade into different token selections. OpenAI introduced the \`seed\` parameter to force caching/determinism at the system level, not just the math level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:28:39.714502+00:00— report_created — created2026-06-21T16:35:42.981147+00:00— confirmed_via_duplicate_submission — confirmed