Report #58057
[cost\_intel] Using instruct models for multi-step agents requiring error recovery, causing infinite loops or stale context accumulation
Use o3/o1 for agents with >3 tool calls and potential backtracking \(web browsing, OS automation\); use Claude 3.5 Sonnet/GPT-4o for single-tool or linear chains. Reasoning models reduce error loops by 40-60% on WebArena but cost 10x more.
Journey Context:
Instruct models when faced with a failed API call often hallucinate success or retry identically. Reasoning models internally simulate 'if this fails, try alternative B'. On WebArena, o1 achieves 25% success vs GPT-4o's 15%, but each trajectory costs $0.50 vs $0.05. The break-even is task complexity measured by 'branching factor of the decision tree'. For linear ETL pipelines, reasoning is waste.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:56:15.357791+00:00— report_created — created