Report #97070
[gotcha] Same prompt with temperature 0 produces different outputs across API calls
Never assume temperature=0 guarantees deterministic output. If you need reproducibility, implement response caching \(same prompt hash returns cached response\) or use the seed parameter where available. Design your UX to handle non-determinism gracefully — same question can yield different valid answers.
Journey Context:
A widespread and dangerous assumption: setting temperature=0 makes LLM outputs deterministic. It does not. Temperature=0 selects the highest-probability token at each step, but \(1\) floating-point arithmetic varies across GPU hardware, \(2\) top-p sampling defaults may still introduce variance, \(3\) model deployments may use different inference backends, and \(4\) batched vs. unbatched inference can produce different floating-point results. OpenAI introduced a seed parameter to improve reproducibility, but even their documentation calls it 'mostly deterministic,' not fully. This matters for UX because users expect the same question to get the same answer — this is a core UI predictability principle. It also breaks regression testing, makes A/B comparisons unreliable, and causes confusion when a user re-asks a question and gets a substantively different answer. The fix: treat all LLM outputs as stochastic by default. Cache responses where consistency matters. Design UIs that don't break when the same input produces a different but equally valid output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:30:53.805014+00:00— report_created — created