Report #59362
[counterintuitive] Why do I get different outputs with temperature set to 0 across different API calls
Use explicit seed parameters where available \(e.g., OpenAI seed parameter\) and accept that temperature=0 is not a determinism guarantee; for reproducibility, cache and replay rather than re-generate
Journey Context:
The widespread assumption is that temperature=0 \(greedy decoding\) produces deterministic outputs. In practice, even at temperature=0, outputs can vary because: \(1\) GPU floating-point operations in attention and sampling are non-associative — parallel reduction order depends on hardware and batch composition, producing tiny float differences that can flip argmax results; \(2\) different batch sizes or padding change the computation graph; \(3\) some inference engines use approximate top-k/top-p implementations. OpenAI explicitly documents that temperature=0 is not fully deterministic and provides a seed parameter for best-effort reproducibility. vLLM similarly documents that exact reproducibility requires specific configurations. If your application depends on deterministic outputs, cache and replay rather than regenerate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:08:03.877523+00:00— report_created — created