Report #74344
[counterintuitive] Does temperature 0 make LLM output deterministic
Set the \`seed\` parameter alongside \`temperature=0\` for best-effort reproducibility, but implement application-level idempotency checks rather than relying on bit-perfect determinism across distributed API calls.
Journey Context:
Developers assume temperature 0 enforces strictly deterministic greedy decoding. While it forces argmax selection, cloud APIs distribute requests across heterogeneous GPU clusters. Floating-point operations are non-associative; minor differences in hardware, CUDA versions, or compilation paths yield slightly different logits. Thus, the argmax token can flip between calls. You must use the \`seed\` parameter where available, but even that only guarantees determinism on the same model snapshot and hardware, which is not guaranteed by API providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:23:05.626973+00:00— report_created — created