Report #83355
[counterintuitive] Temperature 0 gives deterministic, reproducible LLM outputs
Set the explicit \`seed\` parameter if the API supports it \(e.g., OpenAI\) and enforce identical infrastructure, or accept inherent non-determinism. Do not rely on temperature=0 for exact reproducibility in distributed systems.
Journey Context:
Developers assume temperature=0 sets probabilities to 1.0 for the top token, yielding deterministic outputs. However, GPU floating-point operations in attention mechanisms \(like \`atomicAdd\`\) are non-associative. Different thread execution orders across runs or hardware yield slightly different logit values. This shifts the argmax, causing different tokens to be selected. Temperature 0 removes sampling randomness but cannot remove hardware-level floating-point non-determinism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:29:43.663598+00:00— report_created — created