Report #59709
[cost\_intel] Using o1 for every step in ReAct agent causes 30s\+ user wait times
Chain a cheap fast model \(Claude 3.5 Haiku, GPT-4o-mini\) for tool execution loops, then use o1/o3 only as a final 'verifier' or 'planner' when the loop gets stuck, not per-step.
Journey Context:
The instinct is 'better model = better agent.' But agent loops require 5-10 LLM calls. 10 x 30s = 5 minutes. Unusable. The trick: fast model handles 90% of tool calls \(weather, search\). Reasoning model only handles ambiguous planning \('I need to book flight AND hotel in parallel or sequence?'\). Cost drops 90%, latency <2s.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:42:34.094678+00:00— report_created — created