Report #94128
[cost\_intel] Why do reasoning models fail at structured tool use despite superior general reasoning?
Avoid o1-class models for multi-step function calling or strict schema adherence; use instruct models \(GPT-4o, Claude 3.5 Sonnet\) with explicit tool definitions and forced JSON schemas.
Journey Context:
Reasoning models optimize for mathematical correctness over API contract adherence, often 'overthinking' to hallucinate parameters or inject reasoning text into JSON fields. Instruct models are explicitly fine-tuned for tool use and exhibit higher precision on schema-constrained outputs. The cost premium of reasoning is wasted when the task is deterministic parameter extraction rather than open-ended deliberation. Exception: use reasoning models only when the tool selection logic itself requires complex deliberation across >10 possible tools, not for executing known single tool calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:34:51.185015+00:00— report_created — created