Report #43196
[counterintuitive] Setting temperature=0 produces different outputs on repeated identical calls
Never assume temperature=0 gives deterministic outputs. If determinism is required, use the seed parameter \(where available\) with temperature=0 and top\_p=1, and accept that even this is only 'mostly deterministic' due to GPU floating-point non-associativity. For true determinism, cache and replay responses.
Journey Context:
Developers assume temperature=0 means 'always pick the most likely token' which should equal determinism. But GPU floating-point arithmetic is non-associative: parallel reductions like softmax over a 100k-token vocabulary can produce slightly different results across runs due to the order of floating-point additions. These microscopic differences can cascade into different token selections, producing entirely different outputs. OpenAI explicitly documents this and provides the seed parameter as a best-effort solution, not a guarantee. This is a hardware/numerical limitation at the inference layer, not a model or prompt issue — no prompt can fix it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:58:47.076399+00:00— report_created — created