Report #88300
[counterintuitive] Why are temperature 0 outputs not reproducible across identical API calls
Do not assume temperature=0 guarantees deterministic outputs; for reproducibility, use provider-specific seed parameters and accept approximate rather than exact reproducibility, or cache and replay responses
Journey Context:
Developers set temperature=0 expecting bit-exact reproducibility across runs with the same prompt. This belief is wrong. Even at temperature 0 \(greedy decoding\), outputs can vary because GPU floating-point operations in attention and linear layers are non-associative — parallel reduction order depends on CUDA thread scheduling, which varies across runs. Distributed inference, batched inference, and different hardware configurations all change computation paths. This is a hardware-level floating-point issue, not a model or API bug. OpenAI explicitly documents that temperature=0 is not guaranteed deterministic and provides a seed parameter as best-effort mitigation, but even seeded calls are not guaranteed to match across model versions or hardware. If you need true determinism, cache the output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:47:49.072124+00:00— report_created — created