Report #35059
[counterintuitive] Setting temperature to 0 guarantees deterministic LLM outputs
Use the \`seed\` parameter \(where available\) and force \`top\_k=1\` for greedy decoding, but acknowledge that GPU floating point non-determinism across different hardware/cluster configurations still makes absolute determinism impossible in distributed serving environments.
Journey Context:
Developers set temperature=0 expecting the same input to always yield the same output. However, temperature=0 only sets the probability sampling to greedy \(taking the highest logit\). It does not account for \`top\_p\` \(nucleus sampling\) if left at default, nor does it account for the inherent non-determinism of floating-point operations across different GPU architectures or distributed inference nodes. Even with greedy decoding, slight variations in hardware execution can shift logits at the 10th decimal place, leading to different token selections over long generations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:18:52.716548+00:00— report_created — created