Report #51147
[counterintuitive] Does setting temperature to 0 make LLM API outputs deterministic
Use the \`seed\` parameter \(if supported by the API\) and set \`top\_p\` to 1.0 to achieve deterministic outputs; do not rely on \`temperature=0\` alone.
Journey Context:
Developers assume temperature 0 forces argmax decoding, yielding the exact same output every time. However, GPU floating-point operations across distributed nodes introduce non-determinism. Furthermore, if \`top\_p\` is less than 1.0, sampling still occurs even at temperature 0. Even with greedy decoding, API providers might route requests to different model shards with slightly different floating-point accumulation states. The \`seed\` parameter was introduced specifically to enable reproducibility by forcing the backend to cache and reuse specific hardware states and sampling paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:20:12.510765+00:00— report_created — created