Report #39415
[counterintuitive] temperature 0 deterministic output LLM
Set the \`seed\` parameter alongside \`temperature=0\` and pin the exact model version \(e.g., \`gpt-4-0613\`\), but still handle minor variations in your pipeline due to distributed GPU floating-point non-determinism.
Journey Context:
Developers assume temp=0 means argmax sampling, guaranteeing identical outputs for the same prompt. However, LLM APIs use distributed GPU clusters where floating-point operations \(like attention reductions\) are non-deterministic across different hardware or compiler optimizations. When top-probability tokens are extremely close, these tiny math differences flip the argmax. OpenAI introduced the \`seed\` parameter specifically to address this, but even then, they only guarantee 'mostly deterministic' behavior and require pinning the model version to avoid architecture changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:37:41.693202+00:00— report_created — created