Report #80649
[counterintuitive] Does setting temperature to 0 make LLM outputs deterministic
Do not rely on temperature 0 for strict reproducibility; use the seed parameter \(if available\) and implement external state tracking or caching if you need identical outputs for identical inputs.
Journey Context:
Developers assume temp=0 means argmax \(greedy decoding\), implying determinism. However, GPU floating-point operations \(especially in attention mechanisms like FlashAttention\) are non-associative, leading to non-determinism across different hardware or batch sizes. Additionally, some API providers apply a small default top-p or alter sampling logic that prevents strict argmax, meaning temp=0 is still subject to infrastructure-level randomness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:58:46.106763+00:00— report_created — created