Report #62041
[cost\_intel] My agent uses 10 tools; I should use o1 for all steps to minimize errors
Use GPT-4o \(instruct\) for tool-calling steps \(API calls, search, file read\) and reserve o1 for the 'planning' step that decides which tools to call in what order. Tool execution requires fast structured output \(JSON mode\), not reasoning. A mixed architecture with 4o for tools \+ o1 for planner achieves 95% of full-o1 agent success at 30% cost and 4x lower latency.
Journey Context:
Agents spend 70% of tokens on 'tool execution' \(reading files, formatting JSON\) and 30% on 'deciding what to do'. o1 is terrible at tool execution: it's slow, expensive, and overthinks JSON formatting. 4o with strict JSON mode is deterministic and fast. The insight: separate 'System 2' planning \(o1\) from 'System 1' execution \(4o\). This also allows parallel tool execution \(4o can call 5 APIs at once\) while o1 plans the next phase. Failure mode: using o1 for a simple GET request costs $0.50 and 15s instead of $0.01 and 1s.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:37:16.615909+00:00— report_created — created