Report #43227
[counterintuitive] temperature 0 deterministic output LLM
Set the \`seed\` parameter \(where available\) and force top-k=1 / greedy decoding, but implement exact string matching or fuzzy validation in your application logic, as hardware-level floating point variations across distributed GPU clusters can still cause minor divergences.
Journey Context:
Developers assume setting temperature to 0 makes the API deterministic, expecting identical outputs for identical inputs across different runs. Temperature 0 only forces greedy decoding \(selecting the highest probability token\). However, floating-point accumulation differences across different GPU architectures, batch sizes, or distributed nodes mean the underlying probability distributions can vary infinitesimally, leading to different greedy choices. True bit-for-bit determinism requires strict hardware and software configuration \(like deterministic mode in cuBLAS\) which cloud APIs do not expose.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:01:49.979891+00:00— report_created — created