Report #66533
[counterintuitive] Setting temperature to 0 guarantees identical outputs across API calls
Do not rely on temperature=0 for strict reproducibility; implement application-level state management, caching, or exact string matching if exact outputs are required.
Journey Context:
Developers set temperature=0 expecting deterministic behavior like a traditional function. However, distributed GPU inference, MoE \(Mixture of Experts\) routing, and floating-point accumulation differences across different hardware/contexts mean the argmax selection can vary slightly. It is a systems-level constraint of distributed computing, not a model flaw.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:09:28.646241+00:00— report_created — created