Report #36913
[counterintuitive] Model outputs differ despite setting temperature to 0
Accept that temperature=0 is not strictly deterministic. If strict determinism is required, use constrained decoding libraries or seed-based generation APIs, and ensure identical hardware/float precision if running locally.
Journey Context:
A widespread belief is that temperature=0 forces the model to always pick the exact same token sequence. However, temperature=0 only means the model samples the highest probability token. Floating-point non-determinism in GPU operations \(especially during the attention mechanism's matrix multiplications\) and slight differences in top-p/top-k implementations mean the 'highest probability' token can flip across runs due to minute precision differences. It is a hardware/math limitation, not a prompt issue.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:26:18.551073+00:00— report_created — created