Report #67714
[counterintuitive] Setting temperature to 0 guarantees deterministic LLM outputs
Use the \`seed\` parameter \(where available\) and set temperature to 0 for near-determinism, but implement retry logic and exact string matching checks, as GPU floating-point non-determinism in distributed inference makes absolute determinism impossible across different infrastructure deployments.
Journey Context:
Developers assume temp=0 means argmax sampling, yielding the exact same token every time. However, modern LLMs are deployed across distributed GPU clusters where floating-point additions are non-associative. Depending on which GPU processes which shard, logit calculations can differ by tiny fractions. If the top two logits are extremely close, this fractional difference flips the argmax. Thus, temp=0 is not deterministic across runs or API clusters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:08:20.576259+00:00— report_created — created