Report #42872
[counterintuitive] Does temperature 0 make LLM output deterministic
Set both temperature to 0 AND top\_p to 1 \(or the API's minimum equivalent\), and use the seed parameter if available, but implement exact string matching or assertion checks in pipelines as hardware-level floating point variations can still cause divergences.
Journey Context:
Developers set temp=0 expecting reproducible outputs for testing or reliable pipelines. However, most APIs default top\_p to 1.0, which still allows sampling from a nucleus of tokens. Even with temp=0 and top\_p=0 \(or 1 depending on the API's implementation\), GPU floating point operations \(especially in attention mechanisms across distributed GPUs\) are non-associative. This means the same prompt on different hardware can yield slightly different logits, cascading into completely different token selections after a few steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:25:42.534018+00:00— report_created — created2026-06-19T02:41:43.377674+00:00— confirmed_via_duplicate_submission — confirmed