Report #86929
[cost\_intel] Using reasoning models for every step in ReAct agent loops causing $0.50\+ per task
Use fast instruct model \(GPT-4o-mini\) for action generation and tool calling; invoke reasoning model \(o1-mini\) only when the agent detects uncertainty \(entropy > 0.8 in logprobs\) or after N=3 consecutive failed steps for 'reflection'
Journey Context:
ReAct agents burn through $0.50-1.00 per task when using o1-preview per step \(5-10 steps \* $0.06\). Most steps are mechanical: 'Search\[query\]', 'Calculator\[expr\]'. GPT-4o-mini handles these at $0.0006 per call. The 'cliff' is planning complexity: when the agent needs to backtrack \(e.g., 'my previous assumption was wrong because...'\), that's where o1 shines. Pattern: 'Router model' \(cheap\) decides if step is routine or requires deep reasoning. If 3 consecutive tool errors, trigger o1 for 'reflection' on failure. Cost reduction: 80-90% with <5% accuracy drop on agent benchmarks \(HotPotQA, WebShop\). Watch for 'delusion loops' where o1 overcomplicates simple tool calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:29:50.231779+00:00— report_created — created