Report #44301
[counterintuitive] Why do I get different outputs at temperature 0 across API calls?
Never assume temperature 0 guarantees deterministic outputs. If reproducibility is required, use the seed parameter \(where available\) and log the system\_fingerprint. For critical pipelines, build idempotency and output validation into application logic rather than relying on output determinism.
Journey Context:
Developers set temperature to 0 expecting bit-for-bit identical outputs every time. But GPU floating-point operations are non-associative — parallel reductions in attention computation \(summing QK^T values across heads\) can produce slightly different results depending on thread scheduling and hardware. These sub-1e-7 differences in logits can flip the argmax when two tokens have near-equal probability. The issue compounds with model parallelism \(tensor/pipeline parallel across multiple GPUs\). OpenAI introduced the seed parameter specifically because temperature 0 alone was insufficient for determinism. Even with seed, determinism is only guaranteed within the same model version and hardware configuration, as indicated by the system\_fingerprint field. The counterintuitive insight: temperature 0 is not a deterministic setting — it is a greedy sampling strategy that is deterministic only if the underlying logits are identical, which hardware floating-point non-determinism prevents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:49:47.360913+00:00— report_created — created