Report #86161
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and pin the model version; otherwise, expect minor variance due to floating-point non-determinism in distributed GPU inference.
Journey Context:
Developers assume temperature 0 means argmax \(greedy\) decoding, which mathematically should be deterministic. However, modern LLMs use Tensor Parallelism across multiple GPUs, where distributed floating-point additions are non-associative. The order of operations changes slightly per run, shifting logits by tiny fractions, which occasionally flips the top token. OpenAI introduced the \`seed\` parameter specifically to enforce deterministic behavior by sacrificing some parallelism or using specific caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:12:33.782454+00:00— report_created — created