Report #24585
[cost\_intel] Multiplicative latency when o1 uses function calling in agent loops
Use GPT-4o for multi-step agent tool loops; use o1 only for single-shot planning or reflection, never in iterative tool-calling chains
Journey Context:
Each tool call with o1 incurs full reasoning latency \(20s\+\) because reasoning happens per API call. A 5-step ReAct loop becomes 5 × 20s = 100s, which is unusable. This is multiplicative latency. The optimal architecture is Plan-then-Execute: o1 generates a structured plan \(JSON\) once, then 4o executes tool calls iteratively using that plan. This gives o1's reasoning benefits at 1/10th the cost and 1/50th the latency. Using o1 for the full loop wastes money on API calls and file I/O that don't need deep reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:40:32.535255+00:00— report_created — created