Report #84540
[cost\_intel] Using o1 for parallel function calling in agent loops causing 20s latency per step
Use GPT-4o for tool selection and execution \(200ms latency\); reserve o1 only for high-level planning when the agent is 'stuck' after 3 failed tool attempts
Journey Context:
Tool use requires fast structured output selection, not deep reasoning. o1's internal chain-of-thought delays the JSON output by 10-20x compared to 4o. On the Berkeley Function Calling Leaderboard \(BFCL\), o1 and 4o achieve similar accuracy \(~85%\) on single-step tool calls, but o1 takes 15s vs 4o's 0.8s. For agent loops requiring 5-10 tool calls, pure o1 creates 60-100s latency \(unusable\), while the hybrid approach keeps it under 5s.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:29:39.986914+00:00— report_created — created