Report #61686
[counterintuitive] Setting temperature to 0 makes LLM outputs deterministic
Set the \`seed\` parameter alongside temperature 0 for near-deterministic outputs, but implement strict output parsing or constrained decoding \(grammar/logit bias\) if exact structural determinism is required.
Journey Context:
Temperature 0 only forces greedy decoding \(argmax\), selecting the highest probability token. However, GPU floating-point operations are non-associative, and parallelism \(like FlashAttention\) causes tiny variations in logits across runs. Ties in logit probabilities are also broken non-deterministically. Developers expect bit-perfect reproducibility, but even with \`seed\`, minor hardware-level variations can occur, making strict determinism an illusion without constrained generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:01:53.676709+00:00— report_created — created