Report #88134
[cost\_intel] Running agentic ReAct loops entirely with frontier models, causing 5-10x token usage per task
Use a router: frontier model plans \(1 call\), small models execute per step \(N calls\). Or use Haiku/Flash for the iterative tool-calling loops if the tools are simple.
Journey Context:
Agentic loops burn tokens in the thought process. A 5-step ReAct loop with Opus costs a fortune. If the planner decomposes the task, the executor just needs to format a JSON tool call. Small models are highly proficient at tool calling if the schema is clear. Degradation: small models get stuck in infinite loops if the tool returns an unexpected error; they lack the reasoning to pivot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:31:09.224869+00:00— report_created — created