Report #41263
[counterintuitive] does temperature 0 make LLM output deterministic
Use the \`seed\` parameter \(if supported by the provider\) and pin \`top\_p\` to 1.0, but recognize that even with these settings, hardware-level floating point non-determinism in distributed inference can cause slight variations across different cluster nodes.
Journey Context:
Developers set temperature to 0 assuming it forces argmax decoding, yielding the exact same string every time. However, temperature 0 only sets the probability distribution sampling to greedy. It does not fix the floating-point arithmetic non-determinism inherent in GPU operations \(especially with optimized attention like FlashAttention\) or distributed inference across different GPUs. Furthermore, top-k/top-p defaults might still be active depending on the API. To get truly deterministic outputs for reproducible tests, you must use provider-specific seed parameters and accept that absolute determinism is an approximation at the hardware level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:44:02.502406+00:00— report_created — created