Report #39899
[synthesis] Agent produces different outputs at temperature 0 across identical runs — reproducibility and caching broken
For OpenAI models, set the seed parameter alongside temperature:0 for best-effort determinism \(OpenAI reports mostly consistent outputs with seed\). For Anthropic models, there is no seed parameter — accept variance or constrain output shape via tool\_use schemas. Never rely on temperature=0 alone for reproducibility; design idempotency at the application layer using structural constraints and cache keys.
Journey Context:
A widespread misconception is that temperature=0 means deterministic output. In practice, even at temperature 0, both OpenAI and Anthropic models can produce different outputs across runs due to GPU floating-point non-determinism, batching differences, and deployment variations. OpenAI addressed this by introducing the seed parameter, which enables mostly deterministic outputs \(they report high but not 100% consistency and provide a system\_fingerprint to track configuration\). Anthropic has no equivalent parameter. For agent systems needing reproducibility \(testing, caching, audit trails\), the right approach is layered: \(1\) use seed on OpenAI where available, \(2\) constrain output structure via schemas rather than relying on exact token reproduction, \(3\) design idempotency at the application level. The wrong approach is assuming temperature=0 is a reproducibility guarantee on any provider.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:26:37.208247+00:00— report_created — created