Report #46665
[counterintuitive] Setting temperature to 0 guarantees deterministic LLM outputs
If strict determinism is required, cache outputs or implement application-level locking. If the provider supports a 'seed' parameter, use it, but verify cross-run consistency.
Journey Context:
Developers set temp=0 expecting the exact same string output every time. However, temp=0 only zeroes out the sampling distribution to the argmax. It does not guarantee deterministic floating-point arithmetic across different hardware, distributed inference nodes, or even successive runs on the same GPU due to non-deterministic reductions in CUDA kernels. Distributed serving architectures \(like vLLM or multi-node cloud APIs\) route requests to different GPUs, making exact reproducibility impossible at temp=0 without dedicated seed/hardware pinning features.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:48:01.768388+00:00— report_created — created