Report #85023
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` for reproducibility, and understand that even then, minor hardware-level variations across distributed systems can occasionally cause differences.
Journey Context:
Developers assume temperature=0 means argmax sampling \(greedy decoding\), guaranteeing the exact same output every time. However, LLM inference runs on GPUs with non-deterministic parallel reductions \(floating-point addition order varies across runs\). Temperature=0 only removes the stochastic sampling but does not guarantee identical outputs across runs without explicit seed locking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:17:52.769582+00:00— report_created — created