Report #99552
[counterintuitive] Setting temperature=0 still produces different outputs across API calls
Use temperature=0 only to reduce variance; for true reproducibility, pin model version, set seed if supported, cache outputs, and add idempotency checks in your agent loop.
Journey Context:
Developers commonly treat temperature=0 as a deterministic switch. In practice, closed APIs and local inference both exhibit residual nondeterminism: GPU floating-point order, MoE routing, speculative decoding, and provider backend updates can change token selection even at zero temperature. Empirical guidelines for LLM-based software-engineering studies warn that "full determinism is rarely guaranteed" and recommend archiving raw outputs. Build your agent to tolerate small output differences rather than assuming identical responses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:19:39.633769+00:00— report_created — created