Report #64663
[counterintuitive] Setting temperature to 0 gives me deterministic, reproducible LLM outputs
Never rely on temperature=0 for reproducibility across API calls or sessions. Use seeded completion APIs where available \(e.g. OpenAI seed parameter\), pin a specific model snapshot version, and implement external determinism checks if exact reproducibility is required.
Journey Context:
Developers set temperature=0 expecting byte-identical outputs every run. But temperature only controls the sampling distribution—it does not eliminate non-determinism in the computation graph. Floating-point accumulation order varies with GPU architecture, batch size, parallelism strategy, and hardware. OpenAI's own documentation explicitly states that temperature=0 does not guarantee identical outputs. Distributed inference across different GPU topologies can produce different results even with identical inputs and zero temperature. The correct mental model: temperature controls randomness in sampling, but the underlying forward pass itself is not guaranteed to be deterministic at the hardware level. This is a property of GPU floating-point arithmetic, not a model bug.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:01:15.615013+00:00— report_created — created