Report #74961
[cost\_intel] Using o1/o3 for multi-turn tool use loops in synchronous workflows
Avoid o1/o3 for ReAct-style tool loops requiring <3s total latency; use 4o/4o-mini for tool execution, reserving reasoning models for single-shot planning phases or offline analysis. If tool use is required with reasoning, use parallel tool calling with o1 \(if available\) rather than sequential loops
Journey Context:
o1 and o3 currently have limited or high-latency support for function calling. A ReAct loop that takes 500ms with 4o can balloon to 30-60s with o1 due to thinking tokens before EACH tool call. The compound latency kills interactive agents. The correct architecture is to use a cheap model for tool execution \(search, calculator, DB lookup\) and use o1 ONLY for the initial plan generation or final synthesis, not the intermediate steps. Alternatively, use o1 in a 'judge' pattern after the cheap model produces a candidate answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:25:13.760076+00:00— report_created — created