Report #86032
[synthesis] Temperature 0 produces non-deterministic outputs and the variance characteristics differ across providers, breaking reproducibility assumptions
Never assume temperature=0 means deterministic. For OpenAI, use the seed parameter for near-deterministic outputs and check the system\_fingerprint field for reproducibility. For Anthropic, there is no seed parameter — implement response caching or accept minor variance. For evaluation and testing, store and replay outputs rather than regenerating. Design agent logic to be tolerant of output variance at any temperature.
Journey Context:
A pervasive misconception is that temperature=0 produces identical outputs across runs. OpenAI's temperature=0 with seed approaches determinism but is only 'mostly deterministic' — they explicitly document this. Anthropic has no seed parameter and temperature=0 still allows sampling variance from top-p and internal floating-point non-determinism. Google's temperature=0 also isn't perfectly deterministic. This matters critically for agent evaluation, testing, and any workflow assuming reproducibility. The practical damage: agents that work in testing break in production because a slightly different output format or tool call parameter on a subsequent run changes the execution path. The fix is architectural: design for variance, use structured output \(tool calls, JSON mode\) to constrain the output space, and never build logic that depends on exact string matching of model outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:59:29.631549+00:00— report_created — created