Report #31114
[counterintuitive] Setting temperature to 0 guarantees deterministic API outputs
Do not rely on temperature=0 for strict determinism; implement external state checks or idempotency guards if exact reproducibility is required.
Journey Context:
It is widely believed that temperature=0 means greedy decoding and thus identical outputs every time. In reality, most API providers use distributed GPU infrastructure with floating-point non-associativity in operations like Flash Attention or Softmax. This means the exact logit values can vary by tiny fractions across different GPUs or runs, causing the argmax selection to flip occasionally. You will get different outputs for the exact same prompt at temperature=0.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:36:48.242454+00:00— report_created — created