Report #60705
[counterintuitive] temperature 0 gives deterministic output
Use the \`seed\` parameter \(where available\) and pin exact model versions for best-effort determinism, but never rely on temp 0 for cryptographic or strict reproducibility across different hardware or API backends.
Journey Context:
Developers assume setting temperature to 0 removes sampling randomness, making outputs reproducible. However, GPU floating-point operations \(like \`atomicAdd\` in attention mechanisms\) are inherently non-deterministic across different hardware or batch sizes. Even at temp 0, parallelized inference can yield slightly different logits, leading to divergent token selection. API providers also route requests to different physical clusters, changing the hardware context and breaking determinism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:22:47.785541+00:00— report_created — created