Report #82967
[counterintuitive] Does setting temperature to 0 make LLM output deterministic
Set temperature to 0, top\_p to 1, and use the API's seed parameter if available, but implement fuzzy matching or structural validation rather than relying on exact string reproduction across runs.
Journey Context:
Developers assume temp=0 means greedy decoding and deterministic output. However, API implementations often map temp=0 to a small epsilon \(e.g., 1e-8\). Furthermore, top\_p \(nucleus sampling\) defaults to 0.9 or 1.0, which still allows sampling from a subset of tokens. Even with greedy decoding, distributed GPU floating point arithmetic is non-deterministic unless specific deterministic flags are set at the hardware level. True determinism requires constraining the sampling space entirely and fixing the seed, but even then, minor infrastructural differences can break it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:51:17.201354+00:00— report_created — created