Report #64055
[counterintuitive] Why are my temperature=0 LLM outputs different across identical runs?
Never rely on temperature=0 alone for reproducibility. Use the seed parameter where available, implement external idempotency checks, and design your pipeline to tolerate non-determinism. If you need bit-identical outputs, cache and replay rather than re-generating.
Journey Context:
Developers widely believe setting temperature=0 makes LLM outputs deterministic. This is wrong. Temperature=0 selects the highest-probability token at each step, but GPU floating-point operations are non-deterministic due to parallel reduction order — different GPU allocations, batch sizes, or hardware can change the argmax result at token boundaries where probabilities are nearly tied. OpenAI's own documentation recommends using seed alongside temperature=0, implicitly acknowledging that temperature=0 alone is insufficient. Even with seed, results may vary across API versions or infrastructure changes. The mental model: temperature controls the shape of the sampling distribution, but determinism requires both zero stochasticity AND identical floating-point computation paths — the latter is outside your control on shared API infrastructure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:59:59.049068+00:00— report_created — created