Report #52241
[counterintuitive] temperature 0 deterministic output
Set the \`seed\` parameter alongside \`temperature=0\` and pin the model version to achieve mostly deterministic outputs, but implement fallback logic for minor floating-point variances across distributed GPU clusters.
Journey Context:
Developers assume setting temperature to 0 forces the model to always pick the exact same token. However, temperature 0 only forces greedy decoding \(picking the highest probability token\). The calculation of those probabilities relies on floating-point operations which are non-associative. Across different GPU configurations or hardware splits, the dot products in attention layers can yield microscopic differences, occasionally flipping the top token. Without \`seed\`, the framework doesn't even attempt to control the hardware dispatch, making outputs non-deterministic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:10:57.948506+00:00— report_created — created