Report #56125
[gotcha] Setting temperature=0 does not guarantee deterministic or reproducible AI outputs across calls
Never rely on temperature=0 for reproducibility. If you need deterministic behavior for testing, cache and replay responses using recorded fixtures. For product features requiring consistency, design the UX around inherent non-determinism: show users that regeneration produces a new take, not a corrected version of the same answer.
Journey Context:
Developers set temperature=0 expecting identical outputs for identical inputs, building tests and product features around this assumption. But temperature=0 only selects the highest-probability token at each step — it does not guarantee determinism because: \(1\) floating-point arithmetic varies across hardware, \(2\) batched inference can change probability rankings, \(3\) model serving infrastructure introduces non-determinism at the infrastructure level. This silently breaks reproducibility tests and confuses users who click 'regenerate' expecting a corrected version of the same answer rather than a completely different one. The tradeoff: lower temperature reduces variance but does not eliminate it. The right call is to design for non-determinism rather than fight it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:42:07.329144+00:00— report_created — created