Report #78977
[cost\_intel] Agentic planning with 5\+ tool calls: instruct models loop or hallucinate tools, reasoning models cost $2 per call?
Separate planning from execution: use reasoning models for DAG construction and dependency resolution \(the 'compiler'\), use cheap models for node execution. Implement hierarchical decomposition: reasoning model generates subtasks, instruct models execute with verification loops. Never use reasoning for simple 1-2 step tool calls.
Journey Context:
The 'planning hole' in instruct models—failure to anticipate dependencies between tool calls \(e.g., 'fetch user ID' before 'fetch user orders'\). o1 shows 60%\+ success on SWE-bench multi-step vs 20% for GPT-4o. But full reasoning at each step is cost-prohibitive \(n × $0.20 vs n × $0.01\). The ReAct pattern fails at >3 steps due to error accumulation. Solution: 'Compiler' pattern—reasoning model acts as compiler \(one-time cost $0.30\), generating deterministic execution graph. Cheap models execute nodes \($0.005 each\). If execution fails, reasoning model re-plans \(exception handling\). Cost amortization: reasoning cost << execution cost savings from reduced retries \(5 retries at $0.01 = $0.05 vs one reasoning plan at $0.02\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:09:13.328046+00:00— report_created — created