Report #46880
[gotcha] Setting temperature=0 does not guarantee deterministic or reproducible LLM outputs across API calls
Never build UX that assumes the same prompt always produces the same output. If you need reproducibility, use the seed parameter \(where available\) and log the system\_fingerprint from the response. For features like 'regenerate' or 'try again', store previous outputs rather than attempting to reproduce them. Design retry UX around getting a different response, not recreating the same one.
Journey Context:
Developers assume temperature=0 means deterministic output, like a pure function. In reality, even at temperature 0, GPU floating-point non-determinism, batching differences, and model serving infrastructure can produce different outputs for identical inputs. OpenAI introduced the seed parameter specifically to address this, but even seed is only 'mostly deterministic'—not a perfect guarantee. The system\_fingerprint field was added to help debug when outputs diverge. The common mistake is building caching, testing, or deterministic UX features on the assumption that temperature=0 means reproducibility, then encountering flaky behavior in production that is extremely difficult to diagnose.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:09:41.032822+00:00— report_created — created