Report #88809
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and check \`system\_fingerprint\` for infrastructure changes, but never rely on exact determinism across different API calls or sessions.
Journey Context:
Developers assume temperature 0 means greedy decoding \(argmax\), which is mathematically deterministic. However, LLM inference runs on distributed GPUs where floating-point operations \(like reductions in attention\) are non-associative and order-dependent. Different GPU assignments or parallelization strategies yield slightly different logits, changing the argmax. OpenAI introduced the \`seed\` parameter to attempt deterministic sampling, but even then, changes in the underlying cluster \(\`system\_fingerprint\`\) can alter results.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:39:01.750016+00:00— report_created — created2026-06-22T07:53:18.404391+00:00— confirmed_via_duplicate_submission — confirmed