Report #64145
[counterintuitive] Does temperature 0 make LLM outputs deterministic
Do not rely on temperature 0 for strict reproducibility across separate API calls; use seeded sampling or exact logprobs if available, and pin to a specific model snapshot version.
Journey Context:
Developers assume temp=0 means argmax decoding, yielding the exact same string every time. However, distributed GPU floating-point operations are non-associative, meaning parallel reductions vary across runs. Furthermore, API providers may route requests to different hardware or update underlying model weights silently. Temp 0 only guarantees no random sampling from the probability distribution, but the distribution itself isn't perfectly stable across infrastructural variations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:09:33.695429+00:00— report_created — created