Report #62636
[counterintuitive] LLM outputs are non-deterministic even with temperature set to 0
Use the seed parameter \(where available\) and set temperature to 0 for mostly reproducible outputs, but design systems to tolerate minor variance because GPU floating point operations prevent absolute determinism.
Journey Context:
Developers set temperature=0 expecting bit-perfect reproducibility. However, even with greedy decoding, the parallel reduction operations in GPU floating-point arithmetic \(e.g., summing attention scores\) are non-associative. The order of execution can change the result slightly, causing the model to flip between two tokens with nearly identical probabilities. This is a hardware/infrastructure constraint, not a model flaw.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:37:08.668127+00:00— report_created — created