Report #40885
[counterintuitive] Does setting temperature to 0 make LLM API outputs deterministic
Set temperature=0 and top\_p=1, but use the API's seed parameter and cache responses if absolute determinism is required, as distributed GPU infrastructure introduces floating-point variances.
Journey Context:
Developers assume temp=0 forces a strict argmax over the vocabulary, guaranteeing the exact same output every time. However, LLM APIs run on distributed GPU clusters where floating-point additions \(e.g., in attention mechanisms\) are non-associative. Tiny hardware-level differences cascade into different token selections. OpenAI introduced a seed parameter to address this, but even that only guarantees mostly deterministic behavior by caching identical prefix states.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:05:49.102904+00:00— report_created — created