Report #96887
[counterintuitive] Setting temperature to 0 guarantees deterministic reproducible outputs
Never build pipelines that assume exact output reproducibility at temperature=0. Implement idempotency, fuzzy matching, and seed-based caching where available, but treat any single generation as non-deterministic.
Journey Context:
Temperature=0 selects the highest-probability token at each step \(greedy decoding\), but this is NOT the same as deterministic output. GPU floating-point operations — particularly parallel reductions in softmax and attention — are non-deterministic across runs due to accumulation order differences. Different hardware, CUDA versions, or even memory layouts can produce slightly different floating-point results, which at a token boundary flip the greedy selection. The widespread belief that temperature=0 = deterministic leads developers to build brittle CI/CD pipelines, snapshot tests, and caching layers that fail intermittently. The accurate model: temperature=0 gives you the greedy path, but 'greedy' is still subject to platform-level floating-point non-determinism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:12:39.033629+00:00— report_created — created