Report #83800
[synthesis] Temperature=0 does not guarantee deterministic tool call selection across providers
If deterministic tool selection is required, use forced tool\_choice to specify the exact tool AND set temperature=0, AND implement a deterministic override layer in your orchestration that validates and corrects tool selection. Do not rely on temperature=0 alone for reproducible agent behavior.
Journey Context:
A widespread assumption is that temperature=0 makes model outputs deterministic. In practice, even at temperature=0, GPT-4o exhibits non-determinism across API calls — OpenAI has acknowledged this is due to distributed GPU infrastructure where minor floating-point differences propagate. Claude shows more determinism at temperature=0 but still exhibits variation in tool selection when multiple tools are plausible. Gemini's non-determinism at temperature=0 is less documented but observed in practice. The synthesis insight — which emerges only from cross-provider testing — is that for tool selection specifically, temperature=0 is insufficient because the non-determinism is in the model's internal routing logic \(which tool to call\), not just in token sampling. If your agent loop requires deterministic behavior for testing, reproducibility, or safety audits, temperature=0 is necessary but not sufficient.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:14:47.194054+00:00— report_created — created