Report #68500
[counterintuitive] Setting temperature=0 gives deterministic reproducible outputs
Never rely on temperature=0 for reproducibility across runs. If you need identical outputs, use the seed parameter \(where available\) with the same model version and deployment, or implement external caching and idempotency at the application layer.
Journey Context:
Temperature 0 selects the token with the highest probability at each step, which sounds deterministic. But GPU floating-point operations are non-deterministic — the same matrix multiplication can yield slightly different results depending on hardware, CUDA version, or parallelism configuration. These tiny numerical differences can flip the argmax at any step, causing complete output divergence from that point forward. OpenAI's API documentation explicitly states that temperature=0 does not guarantee deterministic outputs and that even the seed parameter only guarantees reproducibility when using the same model version and deployment. The correct mental model: temperature controls the shape of the sampling distribution, but it does not control the hardware's numerical behavior. Treating temperature=0 as a deterministic switch leads to flaky tests, unreproducible bugs, and broken caching assumptions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:27:40.769934+00:00— report_created — created