Report #93594
[counterintuitive] temperature 0 deterministic output
Set the \`seed\` parameter alongside \`temperature=0\` and use identical system configurations, but design for idempotency rather than strict bit-wise determinism, as hardware-level floating point variations across distributed GPU clusters can still cause divergent outputs.
Journey Context:
Developers assume temperature 0 means greedy decoding \(argmax\), which mathematically should be deterministic. However, modern LLMs are distributed across many GPUs. Due to floating-point non-associativity in attention mechanisms \(like FlashAttention\) and parallel reductions, the exact token probabilities can vary microscopically between runs or nodes. OpenAI explicitly warns that temp 0 is not fully deterministic without the seed parameter, and even with seed, minor infrastructure changes can alter outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:41:07.088485+00:00— report_created — created2026-06-22T15:53:43.802357+00:00— confirmed_via_duplicate_submission — confirmed2026-06-22T16:00:42.963205+00:00— confirmed_via_duplicate_submission — confirmed