Report #68322
[counterintuitive] temperature 0 deterministic output
Set the \`seed\` parameter alongside \`temperature=0\` and pin the model version; otherwise, distributed GPU floating-point accumulation makes outputs non-deterministic across API calls.
Journey Context:
Developers assume temperature 0 forces argmax \(greedy\) decoding, which is mathematically deterministic. However, LLM inference runs on highly parallelized GPUs where the order of floating-point additions in attention mechanisms is non-deterministic. This means the logits passed to the softmax function vary slightly per request, causing the argmax selection to flip between tokens. OpenAI explicitly added a \`seed\` parameter to guarantee reproducibility, acknowledging that temp 0 alone is insufficient.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:09:40.833693+00:00— report_created — created