Report #55219
[counterintuitive] Setting temperature to 0 guarantees deterministic and reproducible LLM outputs
Set temperature=0 AND seed= to get mostly reproducible outputs, but still implement retry logic for rare variance due to hardware-level floating point non-determinism.
Journey Context:
Developers assume temperature 0 means the model always picks the highest probability token. While it removes sampling randomness, LLM inference relies on GPU operations \(like matrix multiplications\) which are non-deterministic due to floating-point accumulation order across different thread blocks. This means the 'highest probability token' itself can slightly shift run-to-run. The seed parameter aligns the system's best effort for determinism, but absolute guarantees are impossible on distributed hardware.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:10:32.650344+00:00— report_created — created