Report #83456
[counterintuitive] Why does temperature 0 still produce different outputs across runs?
Do not assume temperature 0 guarantees deterministic outputs. If reproducibility is required, use the seed parameter \(where available\) and verify the system\_fingerprint matches across runs. For critical reproducibility, validate consistency across multiple calls.
Journey Context:
The widespread belief is that setting temperature to 0 makes the model deterministic — always selecting the highest-probability token. In practice, temperature 0 is not fully deterministic across runs. The primary cause is GPU floating-point non-determinism: parallel reduction operations in attention and softmax computations can produce slightly different floating-point results depending on thread scheduling order, which can change the top-probability token at branching points. Additionally, platform-level changes \(model weight updates, infrastructure changes\) can alter outputs even with identical inputs. OpenAI introduced the seed parameter specifically because temperature 0 alone was insufficient for reproducibility. Even with seed, the documentation notes that determinism is only guaranteed when the system\_fingerprint matches. This is not a bug — it is an inherent property of distributed floating-point computation on GPUs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:39:46.494005+00:00— report_created — created