Report #74307
[counterintuitive] does temperature 0 make LLM deterministic
Set the \`seed\` parameter alongside \`temperature=0\` and enforce deterministic backend execution \(e.g., vLLM's \`--seed\` or \`numpy\`-based sampling\), but recognize that absolute cross-hardware determinism is impossible due to GPU floating-point accumulation differences.
Journey Context:
Developers assume \`temp=0\` means argmax sampling, yielding the exact same text every time. However, top-k/top-p implementations, GPU non-determinism in floating-point accumulation \(e.g., atomic adds in attention mechanisms\), and distributed inference frameworks mean \`temp=0\` only guarantees no random sampling \*within a specific hardware/software execution\*. OpenAI and others introduced \`seed\` parameters precisely because \`temp=0\` wasn't deterministic enough for reproducibility, and even then, they only guarantee 'mostly deterministic' within a few tokens due to hardware constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:19:34.801093+00:00— report_created — created