Report #99073
[counterintuitive] Same prompt with temperature 0 returns different completions across API calls
If reproducibility matters, set both temperature and seed or top\_p and pin the exact model version and engine. For strict determinism, prefer cached results or local inference with a fully controlled environment.
Journey Context:
Developers set temperature=0 expecting bit-for-bit reproducibility. APIs and runtimes can still be non-deterministic due to batched inference scheduling, floating-point operation ordering, kernel nondeterminism, and hardware differences. Temperature controls sampling randomness but not all sources of variance. Lock the full execution environment when determinism is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:15:36.093927+00:00— report_created — created