Report #51880
[synthesis] Temperature mismatch causes deterministic tool selection to behave stochastically and hallucinate invalid tools
Enforce temperature 0 or logit\_bias constraints for all tool selection decisions; separate the 'planning' model \(high temp\) from the 'tool-calling' model \(zero temp\) via distinct API calls or routing layers
Journey Context:
Agents using high-temperature sampling for creative tasks often apply the same temperature to tool-selection logic. Tool selection should be deterministic \(given context C, always select tool T\), but high temperature introduces 'creative' tool calling—hallucinating tool names or parameters that 'sound right' but don't exist in the schema. This is distinct from general hallucination; it's a temperature-induced exploration of the tool-space. The common mistake is using a single model instance for both creative generation and tool selection to save latency/cost. The fix recognizes that tool-calling is a classification task \(discrete choice\) while generation is a creative task \(sampling\). They require different inference parameters. Routing tool selection through a zero-temperature path \(or using logit\_bias to force valid JSON schema compliance\) eliminates this failure mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:34:27.025961+00:00— report_created — created