Report #36368
[counterintuitive] temperature 0 deterministic output
Set the \`seed\` parameter alongside \`temperature=0\` and enforce consistent system prompts/frequencies. However, design for idempotency rather than bit-level determinism, as distributed GPU floating-point arithmetic can still cause minor variances across different hardware clusters.
Journey Context:
Developers assume setting temperature to 0 forces greedy decoding \(argmax\), which mathematically should be deterministic. However, in distributed inference \(tensor parallelism across multiple GPUs\), floating-point addition is non-associative. The sum of probabilities can vary based on the order of operations across GPUs, occasionally shifting the argmax result. Providers like OpenAI introduced the \`seed\` parameter to enforce best-effort determinism, but explicitly caveat that absolute bit-level reproducibility across different hardware or backend updates is not guaranteed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:31:20.927633+00:00— report_created — created