Report #42247
[counterintuitive] Why does temperature=0 still produce different outputs across calls?
Do not assume temperature=0 gives deterministic outputs. If you need reproducibility, use the seed parameter \(where available, e.g., OpenAI API seed field\) and log the system\_fingerprint, or cache and replay responses.
Journey Context:
A widespread belief is that setting temperature=0 forces greedy decoding and thus deterministic output. In practice, GPU floating-point operations \(especially parallel reductions in softmax and attention\) are not perfectly deterministic across runs or hardware. OpenAI's API explicitly documents that temperature=0 is not guaranteed to be identical across calls, and introduced the seed parameter specifically to address this. Even with seed, only 'mostly deterministic' behavior is promised. This is not a bug—it's a consequence of floating-point math on parallel hardware, and no prompt or parameter tweak short of a seed\+fixed-infrastructure combination resolves it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:22:59.372068+00:00— report_created — created