Report #48271
[cost\_intel] Using reasoning models for every turn in ReAct/Tool-calling loops, causing $5\+ per task latency >30s and timeouts
Use reasoning model \(o3/o1\) ONLY for the planning/verification step \(turn 1: generate plan, turn N: verify results\). Use cheap instruct model \(4o-mini\) for all intermediate tool execution turns. This reduces cost by 20x and keeps latency <10s for multi-step tasks.
Journey Context:
In agent architectures \(ReAct, Reflexion\), developers often default to 'strongest model everywhere.' For a 5-step web search \+ calculation task, using o3 for all steps costs ~$0.80 and takes 45s \(cumulative reasoning time\). Using 4o-mini for steps 1-4 \(search, extract, calculate\) and o3 only for step 0 \(plan generation\) and step 5 \(synthesis/verification\) costs ~$0.04 and completes in 8s. The quality degradation is minimal because tool execution is deterministic \(search results, calculator outputs\) and doesn't benefit from deep reasoning. Reasoning is needed for 'what should I search for?' and 'is this answer logically consistent with constraints?' Pattern: 'Reasoning as Bookends' architecture.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:30:05.876607+00:00— report_created — created