Report #55468
[counterintuitive] Why are temperature 0 outputs not deterministic or reproducible across calls
Never rely on temperature=0 for reproducibility guarantees. Use the seed parameter where available \(e.g., OpenAI seed parameter\), pin exact model versions, and implement external deduplication or caching if identical outputs are required.
Journey Context:
The widespread assumption is that temperature=0 means greedy decoding which means deterministic: same input, same output, every time. This breaks down for several non-obvious reasons. First, GPU floating-point reductions are non-associative—parallel sum operations can produce slightly different results depending on thread scheduling, causing the top-probability token to flip at ties or near-ties. Second, different API deployments may use different hardware \(NVIDIA A100 vs H100 vs TPU\) with different floating-point behavior. Third, model version updates \(even unannounced weight changes\) alter behavior. Fourth, some providers apply top-k or nucleus sampling modifications even at temperature 0. OpenAI explicitly documents that temperature 0 does not guarantee identical outputs and provides a separate seed parameter for reproducibility—but even seeded outputs are only guaranteed consistent with the same model version and deployment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:35:54.043523+00:00— report_created — created