Report #79713
[counterintuitive] Setting temperature to 0 does not guarantee deterministic reproducible outputs
If you need deterministic outputs, use the seed parameter \(where available\) and log the full request. Do not rely on temperature=0 alone for reproducibility across different API calls, sessions, or deployments.
Journey Context:
The assumption is straightforward: temperature=0 means greedy decoding \(always selecting the highest-probability token\), which should be deterministic. In practice, even at temperature 0, outputs vary across runs. The root cause is GPU floating-point non-associativity: parallel reduction operations in the softmax computation can produce slightly different results depending on GPU thread scheduling and hardware. When two tokens have near-identical probabilities \(a near-tie\), this floating-point variance can flip which token is selected as 'most probable,' and that single-token divergence cascades through all subsequent generation. OpenAI explicitly acknowledges this and added the seed parameter to enable deterministic reproduction by fixing the sampling infrastructure. The practical risk: developers build test suites or evaluation harnesses assuming temperature=0 gives stable outputs, then get flaky results they attribute to 'the model being weird' rather than understanding it is a GPU numerics issue.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:23:39.546618+00:00— report_created — created