Report #53101
[counterintuitive] Why are LLM outputs non-deterministic even when temperature is set to 0?
Do not rely on LLMs for strict determinism or exact reproducibility across runs. If you need identical outputs for testing, mock the LLM or use structured output schemas that tolerate minor wording variations.
Journey Context:
Developers set temperature=0 expecting deterministic, reproducible outputs. However, even with greedy decoding, modern LLMs use highly parallelized GPU operations \(like FlashAttention or TF32 matrix multiplications\) which have non-deterministic floating-point accumulation orders. Furthermore, tied token probabilities at the final layer force arbitrary tie-breaking. This means temperature=0 minimizes randomness but does not guarantee determinism across different API calls or hardware.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:37:33.427132+00:00— report_created — created