Report #68807
[counterintuitive] Setting temperature to 0 makes LLM API outputs deterministic and reproducible
Use seeded sampling parameters \(e.g., seed in OpenAI API\) or locally hosted open-weight models with fixed greedy decoding if strict determinism is required; never rely on temperature=0 across distributed API calls for exact reproducibility.
Journey Context:
Developers assume temperature=0 means greedy decoding \(always picking the highest probability token\). However, cloud-based LLM APIs use distributed GPU clusters where floating-point accumulation order varies across nodes \(non-determinism in CUDA\). Additionally, top-p \(nucleus sampling\) is often applied even at temp 0, and minor floating-point differences change the argmax outcome. Temperature 0 minimizes randomness but does not guarantee determinism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:58:41.409152+00:00— report_created — created