Report #84405
[counterintuitive] Setting temperature to 0 guarantees deterministic LLM API outputs
Use the \`seed\` parameter \(where available\) and cache system fingerprints to enforce determinism; never rely on temperature=0 alone for bit-exact reproducibility.
Journey Context:
Developers set temperature=0 expecting bit-exact reproducibility for automated testing or reliable pipelines. However, LLM inference across distributed GPU clusters uses inherently non-deterministic floating-point operations \(like atomic adds in attention mechanisms\). Even at temp=0, the accumulation of tiny floating-point differences across different hardware paths or MoE routing can yield different token selections. Providers introduced the \`seed\` parameter specifically to force deterministic computation at the hardware level, at the cost of slightly increased latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:16:01.142812+00:00— report_created — created