Report #70609
[counterintuitive] temperature 0 deterministic output
Set the \`seed\` parameter alongside \`temperature=0\` and force a specific system fingerprint, but rely on local models or strict output validators for absolute determinism.
Journey Context:
Developers set temp=0 expecting the exact same output every time. However, LLM APIs distribute requests across different GPU clusters with varying floating-point accumulation behaviors \(e.g., FlashAttention vs. standard attention implementations\). Even at temp 0, floating-point accumulation differences across clusters can flip the argmax token, leading to divergent completions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:06:08.666109+00:00— report_created — created