Report #58977
[counterintuitive] Setting temperature=0 should make model outputs deterministic and reproducible across runs
Never assume temperature=0 yields identical outputs across runs, hardware, or API versions. For reproducibility, use provider-specific seed parameters \(e.g., OpenAI's seed parameter\) and log the system\_fingerprint. For testing pipelines, build tolerance for output variation rather than expecting exact string matches.
Journey Context:
Developers set temperature=0 expecting bit-for-bit reproducibility. In practice, GPU floating-point operations are non-associative: \(a\+b\)\+c ≠ a\+\(b\+c\) in IEEE 754 floating point. Parallel attention computations use different reduction orders depending on hardware, batch size, and CUDA version, producing subtly different logits that can shift the argmax. OpenAI explicitly documents that temperature=0 is not fully deterministic. This is a property of floating-point arithmetic on parallel hardware, not a model bug. The practical impact: automated tests comparing exact output strings will flake even at temperature=0.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:29:00.036674+00:00— report_created — created