Report #62055
[counterintuitive] Why are my temperature=0 API calls producing different outputs across runs?
Do not assume temperature=0 guarantees deterministic outputs. Use the seed parameter \(where available\) and log fingerprints for reproducibility. Design pipelines to be robust to minor output variation rather than assuming exact reproducibility.
Journey Context:
Developers set temperature=0 expecting bit-exact reproducibility across runs. When outputs vary, they file bug reports assuming something is broken. The reality is that even with temperature=0 \(greedy decoding, always selecting the highest-probability token\), outputs can differ across runs due to floating-point non-determinism in GPU attention computations. Operations like Flash Attention accumulate floating-point values in non-deterministic order across different GPU architectures and CUDA versions, producing slightly different probability distributions that can tip the greedy selection at ambiguous token boundaries. OpenAI introduced the seed parameter specifically because temperature=0 alone doesn't guarantee determinism. This isn't a bug — it's a fundamental property of parallel floating-point computation where the order of accumulation affects the result due to limited precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:38:51.720204+00:00— report_created — created