Report #44309
[cost\_intel] o3-mini generating 5000 thinking tokens to decide which of two Python functions to call
Use GPT-4o \(instruct\) for tool selection and parameter filling in multi-tool agents. Reasoning models excel at \*using\* tool outputs for analysis, not \*choosing\* tools. The decision boundary for tool selection is pattern-matching \(intent classification\), which doesn't require chain-of-thought. This reduces tool-selection latency from 5s to 500ms and cuts costs by 90%.
Journey Context:
In agent architectures, there's a distinction between 'orchestration' \(which tool to call\) and 'execution' \(processing results\). Reasoning models are overkill for orchestration because tool schemas are explicit and the decision is typically a 3-class classification problem \(call-API-A, call-API-B, or answer-directly\). Using o3-mini here causes it to generate elaborate justifications for obvious choices \('Let me analyze the user's intent... the user mentioned 'weather' which typically maps to the get\_weather function...'\). The cost/latency explosion is unnecessary. The correct pattern: Fast instruct model for tool selection with forced JSON schema → Execute tool → Feed result \+ original query to reasoning model ONLY if the result requires complex synthesis \(e.g., 'Analyze these 3 API responses to find contradictions'\). Common mistake: Using reasoning for the router in a LangChain agent, causing 3-second delays on every turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:50:29.276324+00:00— report_created — created