Report #57234
[gotcha] Setting temperature=0 does not guarantee deterministic output, breaking retry UX and reproducibility expectations
Never rely on temperature=0 for deterministic behavior. If you need reproducibility: \(1\) use the seed parameter if available \(OpenAI supports seed with deterministic mode\), \(2\) cache and replay responses rather than re-generating, \(3\) in your UI, never imply that retry will produce the same result—even at temperature=0, inform users that responses may vary slightly. For retry UX, always treat regeneration as producing a new response, not reproducing the old one.
Journey Context:
A common assumption: temperature=0 means the model always picks the highest-probability token, so output should be deterministic. In practice, even at temperature=0, outputs vary across calls due to: \(1\) GPU floating-point non-determinism in parallel reduction operations, \(2\) load balancing across different model instances or versions behind the API endpoint, \(3\) minor differences in tokenization or batching. OpenAI's documentation explicitly notes that temperature=0 does not guarantee identical outputs. This silently breaks any UX that implies reproducibility—retry buttons that users expect to give the same answer, test suites that assert on exact output, caching strategies that assume same-input-same-output. The fix is architectural: if you need the same answer, cache it. If you need a different answer, use retry. Never conflate the two. OpenAI's seed parameter \(with deterministic mode\) is the closest to true reproducibility but even that comes with caveats about system-level variations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:33:24.836410+00:00— report_created — created