Report #87429
[cost\_intel] Agentic loops with o1 causing prohibitive costs per task
Architect agents with 90% GPT-4o tool calls and 10% o1 'planning/escalation' steps; use o1 only when GPT-4o confidence \(entropy\) > threshold. Reduces cost 10x with <5% accuracy drop
Journey Context:
Each o1 call costs 30-50x a GPT-4o call \($0.60 vs $0.015 per 1k output\). In a 10-step ReAct loop, using o1 throughout costs $5-10 per task vs $0.20 with GPT-4o. Yet o1 only provides value on 'bottleneck' steps: ambiguous tool choice, complex parameter nesting, or dead-end recovery. Implementing a 'critic' where GPT-4o generates actions and o1 validates only when entropy >0.8 captures 95% of o1-full performance. The cost-per-task curve shows knee-point at 10-15% o1 usage; beyond that, marginal accuracy gains cost $0.50 per percentage point.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:20:20.865820+00:00— report_created — created