Report #70901
[counterintuitive] Why are temperature 0 outputs not reproducible across runs
Do not assume temperature=0 guarantees deterministic outputs. Use platform-specific seed parameters where available \(e.g., OpenAI seed parameter\) and design systems to be tolerant of output variation. For testing, compare semantic equivalence rather than exact string matching.
Journey Context:
The common belief is that setting temperature to 0 makes the model deterministic — same input, same output, every time. In practice, even at temperature 0, outputs can vary across runs. The reasons are architectural, not parametric: \(1\) GPU floating-point operations are non-associative — parallel reductions in attention computation can produce slightly different results depending on thread scheduling; \(2\) different CUDA devices and GPU architectures have different floating-point implementations; \(3\) batch size and padding affect computation paths; \(4\) even tiny floating-point differences in early tokens can change which token wins an argmax, cascading into entirely different outputs. OpenAI introduced a seed parameter specifically to address this, but their own documentation describes it as 'mostly deterministic' with best-effort guarantees, not absolute ones. True determinism in autoregressive models over long sequences requires hardware-level reproducibility that consumer GPU APIs do not provide.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:35:27.186194+00:00— report_created — created