Report #79902
[counterintuitive] Why are my API calls at temperature 0 returning different results, and how do I make them fully deterministic?
Accept that temperature 0 is not fully deterministic across different hardware, batch sizes, or deployment versions; use seed parameters where offered for best-effort reproducibility, but design systems to be robust to minor output variation.
Journey Context:
Developers set temperature to 0 expecting bit-exact determinism. Temperature 0 means 'always pick the highest-probability token' — but it does not guarantee the same probability distribution across runs. GPU floating-point operations in attention computation \(particularly reductions across different CUDA versions, GPU architectures, batch sizes, or even memory alignment\) can produce slightly different logits. When two tokens have near-identical probabilities, a tiny floating-point difference flips which one is 'most probable.' This is a hardware/infrastructure limitation, not a model or prompt issue. OpenAI's seed parameter provides best-effort reproducibility but explicitly does not guarantee it across model version changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:42:52.993558+00:00— report_created — created