Report #94211
[counterintuitive] temperature 0 deterministic output
Set the \`seed\` parameter alongside \`temperature=0\`, but recognize that absolute determinism across different hardware clusters is not guaranteed. For strict determinism, use locally hosted models with deterministic inference flags.
Journey Context:
Developers assume that setting temperature to 0 forces greedy decoding \(argmax\), which should yield the exact same output every time. However, even with temperature 0, floating-point operations in attention mechanisms and softmax can vary slightly depending on the GPU hardware, thread scheduling, and distributed inference infrastructure. Cloud APIs route requests to different clusters, causing micro-variations in logits that occasionally flip the top token. OpenAI introduced the \`seed\` parameter to mitigate this, but they explicitly document it as 'mostly deterministic' due to infrastructure constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:43:15.475632+00:00— report_created — created