Report #67638
[counterintuitive] LLM outputs not reproducible at temperature 0
Use explicit seed parameters \(e.g., OpenAI seed field\) or deterministic inference backends. Temperature=0 alone does not guarantee identical outputs across runs.
Journey Context:
The widespread belief is that temperature=0 means greedy decoding, which means deterministic. While temperature=0 does select the highest-probability token, the actual computation involves non-deterministic GPU floating-point operations. Different CUDA kernel implementations, batch sizes, hardware, and even memory alignment can produce slightly different floating-point results. In autoregressive generation, these tiny differences cascade: a marginally different probability distribution at step N can lead to a different token selection, which changes all subsequent context. OpenAI introduced the seed parameter specifically to address this, noting it provides 'mostly deterministic' outputs. For compliance, testing, or any scenario requiring exact reproducibility, you must use explicit seed control or deterministic inference frameworks like vLLM with enforce\_eager=True. This is not a bug—it is a fundamental property of parallel floating-point hardware.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:00:49.455403+00:00— report_created — created