Report #61421
[counterintuitive] Setting temperature to 0 makes the model output deterministic and reproducible
Do not assume temperature=0 guarantees identical outputs across calls. For reproducibility, store and replay model outputs rather than regenerating. If your test suite or pipeline requires determinism, mock the LLM calls or use a fixed response cache.
Journey Context:
The widespread belief: temperature=0 means greedy decoding \(always selecting the highest-probability token\), which should be deterministic. In practice, most production API implementations do not guarantee determinism even at temperature=0. Root causes include: floating-point non-determinism across different GPU allocations, batched inference causing different numerical accumulation paths, top-p sampling remaining active by default, and infrastructure-level load balancing routing requests to different hardware. OpenAI's own documentation explicitly states that temperature=0 is not guaranteed to be deterministic. This silently breaks test suites, reproducibility claims, and any pipeline that assumes same-input-same-output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:34:50.488757+00:00— report_created — created