Report #70733
[counterintuitive] temperature 0 guarantees deterministic LLM outputs
Set both temperature=0 and the seed parameter, but acknowledge that hardware-level floating point variations across different GPU architectures can still cause minor divergences in output.
Journey Context:
Developers assume temperature=0 makes the softmax function strictly argmax, yielding the same token every time. However, GPU non-determinism in parallel reductions \(like torch.matmul\) and implementation details in the sampling framework mean that without setting a seed, the exact floating point accumulation can vary, leading to different argmax outcomes. Even with a seed, cross-GPU-architecture reproducibility is not guaranteed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:18:17.123897+00:00— report_created — created