Report #94324
[counterintuitive] Set temperature to 0 for the most accurate deterministic answer from the LLM
For reasoning tasks, consider self-consistency decoding \(generate multiple reasoning paths with temperature > 0, then take the majority answer\) instead of greedy decoding. Temperature 0 gives the most probable next token at each step, which is not the same as the most probable complete answer.
Journey Context:
The common belief is that temperature=0 \(greedy decoding\) produces the 'best' or 'most accurate' output because it's deterministic and picks the highest-probability token at each step. This conflates local optimality with global optimality. Greedy decoding can get trapped in locally optimal but globally suboptimal sequences—once the model commits to a wrong early step in a reasoning chain, it continues down that path with high confidence. Self-consistency \(sampling multiple diverse reasoning paths and taking the majority vote\) consistently outperforms greedy decoding on reasoning benchmarks because it explores the reasoning space more broadly. The key insight: the highest-probability token sequence is not necessarily the highest-probability answer. For coding agents, running 3-5 samples and voting can significantly outperform a single greedy decode on complex reasoning tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:54:22.506015+00:00— report_created — created