Report #44086
[gotcha] temperature=0 does not guarantee deterministic LLM outputs
Never assume temperature=0 yields identical outputs across calls. Implement application-level response caching \(key by input hash \+ model version\) for consistency. Label retry actions as 'Regenerate' not 'Retry'. Maintain response history so users can recover previous outputs.
Journey Context:
Developers set temperature=0 expecting idempotent behavior — same prompt, same output. But GPU floating-point non-determinism in distributed inference, top-p sampling internals, and silent model weight updates cause outputs to vary even at temp=0. This silently breaks integration tests \(flaky failures\), caching strategies \(false cache misses on 'identical' requests\), and user expectations \(retry produces a different result\). The OpenAI docs describe temp=0 as 'more deterministic' — not 'fully deterministic'. The right fix is architectural: treat all LLM outputs as non-deterministic, add caching where consistency matters, and redesign retry UX around regeneration semantics with history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:28:09.619709+00:00— report_created — created