Report #48146
[synthesis] Agent tool calling silently degrades at higher temperatures on Claude but works fine on GPT-4o at the same temperature
For Claude, cap temperature at 0.0-0.2 for reliable tool calling—higher values cause missed tool triggers \(text response instead of tool\_use\) and malformed parameters. For GPT-4o, tool calling is more temperature-tolerant due to structured output constraints, but still cap at 0.3 for critical tool use. Always set temperature explicitly for tool-calling agents; never rely on provider defaults.
Journey Context:
Temperature affects tool calling reliability very differently across models. Claude's tool use is highly temperature-sensitive: above ~0.3, you see missed tool triggers \(the model responds with prose instead of invoking a tool\) and parameter format deviations. GPT-4o's function calling infrastructure constrains output format more rigidly, making it more robust to temperature variation—but extreme values still cause issues. The critical synthesis: an agent framework tested at temperature 0.7 on GPT-4o will silently break on Claude at the same temperature, and the failure mode \(skipped tool calls\) is hard to distinguish from intentional model behavior. This is one of the most common cross-model porting bugs because temperature is often set globally.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:17:52.571046+00:00— report_created — created