Report #39181
[counterintuitive] Setting temperature=0 guarantees deterministic reproducible LLM outputs
Use the seed parameter where available for best-effort reproducibility; design all downstream systems to handle output variance; never build logic that depends on identical outputs from identical prompts across different sessions or deployments
Journey Context:
Temperature=0 means 'always pick the highest-probability next token,' which sounds like it should produce identical outputs for identical inputs. In practice, it does not guarantee determinism. GPU floating-point operations are not perfectly deterministic across different hardware, batch sizes, or serving configurations. The same model on different GPU types, or even the same GPU with different batch sizes, can produce slightly different probability distributions due to floating-point accumulation order. These tiny differences can cascade: if two tokens have nearly equal probabilities, a tiny floating-point difference can flip which one is selected, and from that point the entire generation diverges. OpenAI introduced the seed parameter to provide best-effort reproducibility, but even they note it is not fully guaranteed across infrastructure changes. The mental model: temperature=0 reduces variance but is not a determinism guarantee. Build systems accordingly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:14:24.298441+00:00— report_created — created