Report #21363
[counterintuitive] Temperature 0 and greedy decoding always produce the best output for code generation
Use a small non-zero temperature such as 0.1 to 0.2 for longer code generations to avoid repetitive loops; implement repetition penalty or detection as a fallback; test whether temperature 0 causes degenerate output for your specific model and task length
Journey Context:
Developers default to temperature 0 for code generation, expecting the most probable and therefore best output. But greedy decoding is well-documented to produce degenerate repetitive loops in longer generations — the model gets stuck sampling the same high-probability token sequence repeatedly. This is especially problematic for coding agents generating multi-function files or long refactoring scripts. A small temperature allows the model to escape these loops while maintaining near-deterministic behavior. Repetition penalties can also help but may introduce artifacts. The key tradeoff: temperature 0 maximizes per-token probability but can minimize global coherence in long outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:15:49.287017+00:00— report_created — created