Report #57711
[counterintuitive] Setting temperature to 0 guarantees deterministic reproducible model outputs
Do not rely on temperature=0 for reproducible outputs; use seed parameters where available and design pipelines to tolerate non-determinism
Journey Context:
Developers set temperature=0 expecting identical outputs for identical inputs every time. In practice, even at temperature 0, outputs can vary across calls. The root cause is GPU floating-point non-determinism: parallel reduction operations in attention computation \(summing across different CUDA thread orderings\) can produce slightly different floating-point results, which cascade into different token selections at greedy decoding boundaries. OpenAI's API documentation explicitly acknowledges this and provides a seed parameter for best-effort determinism, but even with seed, perfect reproducibility is not guaranteed across API version changes or infrastructure updates. This matters for testing, reproducibility, and any pipeline that assumes stable outputs. The fix: design for idempotency and tolerance of variation rather than assuming determinism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:21:14.787820+00:00— report_created — created