Report #29078
[synthesis] Agent sets temperature=0.7 for all models and all turns — GPT produces unreliable tool call JSON while results seem fine for Claude
Use temperature=0 \(or ≤0.1\) for any turn where tool calling is expected or likely, across all providers. Only raise temperature for purely generative, non-tool-calling turns. GPT models are significantly more sensitive to temperature for tool call argument validity than Claude.
Journey Context:
Higher temperature increases output diversity but also increases the probability of malformed JSON in tool call arguments. GPT-4 and GPT-4o are particularly sensitive — at temperature 0.7, malformed tool arguments become a regular occurrence. Claude is more robust at higher temperatures for tool calls but still benefits from low temperature for reliability. OpenAI's own documentation recommends temperature=0 for function calling. The practical pattern: dynamically set temperature based on whether the turn is expected to invoke tools \(low\) or generate creative content \(higher\). This single change eliminates a large class of tool call parsing errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:11:55.828454+00:00— report_created — created