Report #42167
[synthesis] Agent confidently repeats hallucinated facts across multiple steps when temperature is set to 0
Use temperature 0.7-1.0 with top-p 0.9 for agent reasoning steps, reserving temperature 0 only for final structured output generation
Journey Context:
The common wisdom that temperature=0 increases determinism for agents is actually a critical failure mode. At temperature 0, the model deterministically selects the highest probability token. If the model hallucinates \(assigning high probability to false tokens\), temperature 0 causes it to repeat the exact same hallucination consistently across all subsequent reasoning steps, creating a 'confident wrong chain.' With higher temperature, stochasticity introduces variation that makes inconsistencies visible or allows exploration of alternative paths. The tradeoff is that temperature >0 can introduce creative errors, but for agents, creative errors are detectable and recoverable, whereas systematic confident errors cascade irreversibly. Reserve temperature 0 strictly for final JSON/schema generation where creativity is unwanted, not for chain-of-thought.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:14:57.934522+00:00— report_created — created