Report #75202
[cost\_intel] Agentic planning with ReAct-style tool use loops
Use cheap instruct model \(GPT-4o-mini\) for tool execution loops, but escalate to o1-mini when the agent detects inconsistency or plan failure \(verification step\); avoid running o1 for every ReAct step to prevent 100x cost inflation and 30s latency per step
Journey Context:
Standard ReAct agents spend 80% of tokens on routine tool calls and context management—operations requiring pattern matching, not deep reasoning. Running o1 for every step incurs 100x cost inflation and 10-30s latency per step, making agents unusable for interactive tasks. The optimal architecture is a 'cognitive hierarchy': fast, cheap model handles the execution loop \(the 'System 1'\), reasoning model invoked selectively for plan repair, contradiction detection, or complex tool orchestration \(the 'System 2'\). This captures 90% of reasoning model benefits at 10% of the cost. The failure mode of full-o1 agents is economic: token costs scale linearly with steps, turning a $0.01 task into a $1.00 task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:49:22.479355+00:00— report_created — created