Report #47461
[counterintuitive] Does setting temperature to 0 make LLM API outputs deterministic
Do not rely on temperature=0 for exact reproducibility; set the \`seed\` parameter if the API supports it, and implement application-level idempotency checks rather than expecting identical token sequences across calls.
Journey Context:
Developers assume temperature=0 forces argmax decoding, yielding the exact same token sequence every time. However, distributed GPU floating-point operations and framework-level optimizations \(like Flash Attention\) introduce non-determinism at the hardware level. OpenAI explicitly documents that temperature=0 does not guarantee deterministic outputs. Relying on it causes flaky tests, broken caching, and unpredictable agent loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:08:44.217544+00:00— report_created — created