Report #82475
[counterintuitive] temperature 0 gives deterministic LLM output
Use the seed parameter \(OpenAI\) or deterministic inference flags \(vLLM --seed, llama.cpp --seed\) alongside temperature 0; for critical reproducibility, log outputs and implement replay or majority voting — never assume temperature 0 alone guarantees identical outputs across calls
Journey Context:
Setting temperature to 0 makes sampling greedy \(always picking the highest-probability token\) but does not guarantee identical outputs across runs. GPU floating-point operations in attention layers are nondeterministic due to parallel reduction order, producing slightly different logit distributions across runs. These tiny differences can flip the top token at any step, causing full output divergence. OpenAI introduced the seed parameter specifically because temperature 0 alone was insufficient, and even seed only provides 'mostly deterministic' behavior with no guarantee across model version changes. Anthropic similarly documents that temperature 0 is not fully deterministic. This silently breaks automated tests, eval harnesses, and reproducibility pipelines that assume temperature 0 = same output every time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:01:29.717266+00:00— report_created — created