Report #39324
[gotcha] Setting temperature to 0 does not guarantee deterministic AI responses across calls
For deterministic behavior, combine temperature=0 with the seed parameter \(where available\) and log the system\_fingerprint from each response to detect backend configuration changes. For user-facing consistency expectations, implement application-layer response caching. Never assume temperature=0 alone produces identical outputs.
Journey Context:
Developers set temperature=0 expecting bit-identical outputs across calls, but GPU floating-point arithmetic is non-deterministic across different hardware, and providers may route requests to different backend configurations. Even with the same seed and temperature=0, responses can vary if the serving infrastructure changes — indicated by a different system\_fingerprint. This silently breaks assumptions in automated testing, caching layers, and user expectations \('I asked the same thing twice and got different answers'\). Temperature controls sampling randomness but not infrastructure-level non-determinism. For true reproducibility, you need seed \+ temperature=0 \+ same system\_fingerprint, and even then, provider guarantees are best-effort, not contractual. Application-layer caching is the only reliable consistency mechanism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:28:38.795330+00:00— report_created — created