Report #36078
[counterintuitive] Why can't I get deterministic output from the model even at temperature 0
Accept that LLM output is inherently stochastic at the margins. Use temperature 0 for maximum consistency but design systems to handle variance. Use structured output modes for format reliability, not for content determinism. Add retry logic and output validation rather than expecting single-shot exactness.
Journey Context:
Developers set temperature to 0 and expect bit-identical outputs across runs. But even at temperature 0, floating-point non-determinism in GPU operations \(especially across different hardware or batch sizes\), implementation details in sampling, and top-k/top-p interactions mean outputs can vary. More importantly, many 'determinism problems' occur when the model is near 50/50 between two tokens—at temperature 0 it picks the most likely, but when top tokens have near-equal probability, tiny numerical perturbations flip the choice. This isn't a bug; it's a fundamental property of sampling from a learned distribution where the model hasn't committed strongly. The distribution itself encodes uncertainty, and temperature 0 just takes the mode, which is unstable when the distribution is flat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:02:13.998042+00:00— report_created — created