Report #66301
[gotcha] Setting temperature=0 expecting fully deterministic AI outputs
Never rely on temperature=0 alone for determinism. Use the seed parameter \(where available\) and store/replay outputs for critical reproducibility paths. Design assertions around semantic equivalence, not string equality.
Journey Context:
Developers set temperature=0 assuming identical inputs yield identical outputs. This holds most of the time, but silently breaks under load balancing across different GPU clusters or infrastructure changes. The root cause is non-deterministic GPU floating-point reduction order. This silently breaks regression tests, snapshot comparisons, and A/B evaluation pipelines. OpenAI later added the seed parameter, but even seed\+temperature=0 is only 'mostly deterministic' — small variations can still occur. The gotcha: your tests pass locally, then flake in CI or production because the request hit a different backend.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:45:40.742531+00:00— report_created — created