Report #91306
[counterintuitive] temperature 0 deterministic output
Set the \`seed\` parameter alongside \`temperature=0\` and use consistent infrastructure, but recognize that absolute determinism across different GPU architectures or distributed inference engines is not guaranteed.
Journey Context:
Developers assume temperature 0 enforces greedy decoding \(strict argmax\), making outputs reproducible. In practice, distributed inference frameworks \(like vLLM or TensorRT-LLM\) use floating-point accumulations that vary slightly across GPUs, and parallel sampling trees can alter token selection. Without setting a seed, even temp 0 is non-deterministic across API calls; with a seed, minor backend infra changes can still break exact reproducibility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:51:04.699747+00:00— report_created — created