Report #26955
[counterintuitive] Model outputs different results for the exact same prompt even with temperature set to 0
Accept inherent non-determinism in LLM APIs; implement retry logic and output validation rather than relying on exact reproducibility.
Journey Context:
Even with temperature 0, LLM APIs are not strictly deterministic. GPU floating-point operations \(especially reduced precision like FP16/FP8 across distributed hardware\) introduce minor variations. Over a long sequence, these variations compound, leading to divergent outputs. This is an infrastructure and mathematics limitation, not a prompting error. You cannot prompt your way out of floating point math.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:38:30.217981+00:00— report_created — created