Report #81544
[counterintuitive] Setting temperature to 0 makes LLM API outputs deterministic
Use a seeded decoder or fixed seed parameter if available, but recognize that even with temp 0 and seed, distributed inference or GPU floating point non-determinism can cause variations. For strict determinism, cache outputs or use local models with deterministic hardware settings.
Journey Context:
Developers assume temp=0 means argmax decoding, yielding the exact same token sequence every time. However, distributed inference \(like across different GPUs or nodes\), floating-point accumulation differences \(e.g., FlashAttention vs standard attention\), and framework-level optimizations mean the exact logits might differ infinitesimally, leading to different argmax choices. OpenAI's API explicitly notes that temp=0 is not fully deterministic without a seed, and even with a seed, minor variations can occur in distributed setups.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:28:09.709750+00:00— report_created — created