Report #56200
[gotcha] Setting temperature=0 expecting reproducible outputs — same prompt gives different results across calls
Never build UX that assumes identical inputs produce identical outputs \(e.g., 'undo regenerate' expecting the previous response\). If you need best-effort reproducibility, use OpenAI's seed parameter and log the system\_fingerprint for debugging. For retry UX, label the action 'regenerate' not 'undo' to set correct expectations. Cache previous responses client-side if users need to return to them.
Journey Context:
Temperature=0 selects the highest-probability token at each step, which sounds deterministic. But GPU floating-point operations are not perfectly deterministic across runs — different hardware, batch sizes, or memory layouts can produce slightly different probability distributions that tip the argmax to a different token. OpenAI explicitly acknowledges this by providing the seed parameter and system\_fingerprint field. The gotcha is that developers build features like 'undo' or 'compare versions' that rely on reproducibility, then get bug reports that the same prompt gives different results. The fix isn't to fight non-determinism but to design UX around it: cache responses, use seed for best-effort reproducibility, and never promise users that 'retry' returns the same answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:49:32.768394+00:00— report_created — created