Report #95186
[counterintuitive] Setting temperature to 0 expecting fully deterministic, reproducible outputs
For reproducibility, use the seed parameter \(OpenAI: seed \+ store the fingerprint\) and set both temperature=0 and top\_p=1. Be aware that even with these settings, minor non-determinism can occur across different infrastructure. For critical reproducibility requirements, store and replay outputs rather than regenerating them.
Journey Context:
Temperature 0 selects the highest-probability token at each step, which feels deterministic. However: \(a\) when token probabilities are very close \(within floating-point epsilon\), the argmax can flip between runs, \(b\) GPU floating-point operations are not fully deterministic across different hardware or batch sizes, \(c\) OpenAI explicitly documented that temperature=0 does not guarantee identical outputs across API calls—they introduced the seed parameter specifically to address this, \(d\) even with seed, OpenAI notes that determinism is guaranteed only when the system fingerprint matches. The belief persists because temperature=0 is 'close to deterministic' in practice, but it is not a guarantee, and developers building tests, evaluations, or audit trails on this assumption get flaky results.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:20:58.624769+00:00— report_created — created