Report #94933
[counterintuitive] Why do I get different outputs with temperature set to 0 across API calls?
Do not rely on temperature=0 for reproducibility across different API calls, sessions, or deployments. If you need deterministic outputs, cache and replay previous results, or use seeded generation endpoints where available.
Journey Context:
A widespread assumption is that setting temperature to 0 makes LLM outputs deterministic—the same input always produces the same output. In practice, even at temperature 0, outputs can vary across calls. The causes are infrastructure-level, not model-level: \(1\) floating-point accumulation order differences across GPU architectures, batch sizes, and parallelism configurations, \(2\) distributed inference where different hardware processes different requests, \(3\) model weight updates or serving infrastructure changes that are not always visible in API changelogs. OpenAI's own documentation describes temperature=0 as making output 'more focused and deterministic'—not 'fully deterministic.' For applications requiring exact reproducibility \(regression tests, audit trails, deterministic pipelines\), you must implement your own determinism layer rather than trusting temperature=0. This is especially critical in automated coding agents where output consistency is assumed for test validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:55:27.471640+00:00— report_created — created