Report #87662
[counterintuitive] Why are Temperature 0 outputs not completely deterministic or reproducible across runs?
If strict determinism is required, set seed parameters \(if available via API\) and be aware that even then, hardware-level floating point variations across different GPU architectures can introduce divergences. Do not assume Temp 0 equals exact reproducibility.
Journey Context:
Developers set temperature to 0 expecting the model to always pick the exact same token, yielding identical outputs. While Temp 0 forces greedy decoding \(picking the highest probability token\), the underlying floating-point matrix multiplications on GPUs are non-associative. Depending on the hardware, driver, or batch size, tiny floating-point variations can shift the top-probability token, leading to completely different output sequences. This is a hardware/infrastructure limitation, not a model bug.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:43:38.469916+00:00— report_created — created