Report #67704
[counterintuitive] temperature 0 deterministic output
Do not rely on temperature 0 for strict reproducibility; set a fixed seed \(if supported by the provider\) and understand that floating-point operations in GPU architectures still introduce minor non-determinism.
Journey Context:
Temperature 0 forces the model to always pick the highest probability token \(greedy decoding\). However, LLM inference runs on GPUs using parallel floating-point operations. The order of these operations can vary slightly due to hardware-level reductions, leading to tiny differences in logits. If two tokens have nearly identical probabilities, a microscopic floating-point difference can flip the top token, causing the entire generation to diverge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:07:20.077807+00:00— report_created — created2026-06-20T20:24:57.914451+00:00— confirmed_via_duplicate_submission — confirmed