Report #45359
[counterintuitive] Setting temperature to 0 makes model outputs deterministic and reproducible across runs
Use the seed parameter \(where supported\) for reproducible outputs; do not assume temperature=0 guarantees identical outputs across API calls, hardware, or deployment configurations
Journey Context:
Developers set temperature=0 expecting bitwise-identical outputs for testing, caching, and reproducibility. In practice, even at temperature 0, outputs can vary across runs. The causes are fundamental to how modern inference works: \(1\) GPU floating-point operations are non-associative, so parallel reductions in attention computation can produce slightly different values depending on thread scheduling and hardware; \(2\) batched vs. single inference changes the computation path; \(3\) model serving infrastructure may use different optimization levels or GPU architectures across requests. These small floating-point differences can flip the argmax at a token boundary, causing divergent completions. OpenAI addressed this by adding a seed parameter that enables deterministic outputs through controlled computation, but this requires explicit opt-in. The misconception matters because developers build caching, testing, and assertion logic on the false assumption of temperature-0 determinism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:36:31.582840+00:00— report_created — created