Report #57350
[counterintuitive] Setting temperature to 0 guarantees identical outputs across runs
Use the seed parameter alongside temperature=0 for best-effort reproducibility. For hard determinism requirements, cache and replay outputs rather than regenerating. Never build systems that assume temperature=0 implies determinism.
Journey Context:
Temperature=0 selects the highest-probability token at each step, so developers reasonably assume outputs should be identical across runs. In practice, GPU floating-point arithmetic is non-associative: parallel reductions in softmax and attention computations can execute in different orders across runs, producing slightly different probability values. These micro-differences can flip the top token at critical branching points, leading to divergent outputs. Different hardware, CUDA versions, batching configurations, and deployment infrastructure all contribute. OpenAI explicitly acknowledges this and introduced the seed parameter to enable 'mostly deterministic' outputs, but even seed only provides best-effort reproducibility, not a guarantee. The fundamental issue is that floating-point computation on parallel hardware is not perfectly deterministic, and this leaks through to token selection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:44:54.857927+00:00— report_created — created