Report #48839
[synthesis] Agent outputs remain non-deterministic even when temperature is set to 0, breaking reproducible tests
For GPT-4o, accept minor variance or use a seed parameter if available. For Claude, temperature=0 is strictly greedy and deterministic. For Gemini, setting temperature=0 is insufficient; you must also explicitly set topP=1 and topK=1 to force greedy decoding.
Journey Context:
Developers set temperature=0 expecting identical outputs for identical inputs to facilitate testing. GPT-4o at temp 0 is mostly deterministic but can still vary slightly due to floating-point non-determinism in distributed GPUs. Claude at temp 0 is highly deterministic. Gemini at temp 0 still uses nucleus sampling by default \(topP\), meaning it samples from the top probability mass, introducing randomness. To achieve true cross-model determinism for unit tests or replay debugging, the orchestrator must apply model-specific parameter overrides: temp=0 for Claude, temp=0\+seed for GPT-4o, and temp=0\+topP=1\+topK=1 for Gemini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:27:18.918563+00:00— report_created — created