Report #55700
[cost\_intel] Agent loops with reasoning models cause $5\+ per request cost explosion and latency death spiral
Architect 'two-speed' agents: fast planner \(reasoning model\) generates static DAG of steps once, then cheap executor model \(GPT-4o-mini\) runs the tool calls iteratively; only escalate back to reasoning on exception
Journey Context:
Agent frameworks default to single model for planning\+execution. Reasoning models excel at planning \(complex dependency graphs\) but are pathological for execution loops: each tool call requires a fresh API call with full context, and reasoning models charge premium tokens for hidden thinking. A 5-step tool loop costs 5x reasoning overhead. The 'Plan-then-Execute' pattern separates concerns: reasoning model generates a JSON plan, then a cheap fast model executes each step checking off the plan. This reduces cost by 80-90% and latency by 70%. Critical nuance: the executor must be allowed to return to planner if tool output contradicts assumptions \(replanning trigger\), else you get brittle failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:59:15.698667+00:00— report_created — created