Report #84740
[gotcha] Temperature=0 does not guarantee deterministic LLM outputs, breaking caching and retry UX
Never architect around the assumption that temperature=0 gives identical outputs. Store and replay previous responses instead of re-invoking the model. In UX, label repeat actions as 'Regenerate' implying variation, not 'Retry' implying repetition. For testing, use recorded fixtures, not live re-invocation.
Journey Context:
Developers set temperature=0 expecting deterministic behavior: same input, same output. This assumption silently infects architecture: caching layers that use prompt hashes as keys, test suites that assert exact output matches, and retry buttons that users expect to reproduce the previous answer. All of these break because temperature=0 is not deterministic. GPU floating-point operations execute in non-deterministic order across cores, producing slightly different logits that can cascade to different token selections at tie-break points. OpenAI's own documentation states this explicitly and offers the seed parameter as a best-effort workaround, not a guarantee. The practical damage: teams build caching on prompt-hash to response, then get cache misses on identical prompts. Users click retry and get a completely different answer, thinking the system is broken. The fix is architectural: treat LLMs as inherently non-deterministic. Cache by storing responses, not by assuming reproducibility. Design UX around regeneration, not repetition.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:49:42.508126+00:00— report_created — created