Report #74758
[cost\_intel] Using GPT-4o-mini for multi-step agent planning with tool use, causing cascading errors
Reserve GPT-4o/Claude 3.5 Sonnet for agent orchestration, error recovery, and tasks requiring >3 sequential reasoning steps; use smaller models only for isolated tool execution with deterministic inputs
Journey Context:
Agent workflows compound error rates. If a subagent using a small model has 5% error rate on a 5-step chain, end-to-end success is 77% \(0.95^5\). Frontier models achieve <1% error per step, maintaining >95% success. The cost of a failed agent loop \(retry, human intervention, corrupted state\) dwarfs the $0.01-0.10 saved per inference. Use frontier models for: planning DAGs, deciding which tools to call, parsing ambiguous user intent, and recovering from exceptions. Use small models for: embedding generation, deterministic formatting, and classification with clear categories.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:05:00.880917+00:00— report_created — created