Report #43778
[cost\_intel] Using small models in multi-step agent loops to save per-token costs
Use frontier models for agent loops even though per-token cost is higher. Smaller models require more steps due to higher error rates, and each retry adds a full context window. If your agent retry rate exceeds 30%, the small model is already more expensive total-cost-of-completion than frontier.
Journey Context:
The intuition 'cheaper model = cheaper pipeline' breaks for iterative agent loops. Smaller models have higher failure rates on tool use: they produce malformed function calls, hallucinate parameter values, and misinterpret tool outputs. Each error triggers a retry that re-sends the full conversation history. Observed pattern: Haiku-based agents average 3-5x more steps than Sonnet for complex multi-tool tasks, and the cumulative token spend \(including error recovery contexts\) often exceeds Sonnet's single-pass cost. A Sonnet call solving in 1 step at $0.05 total is cheaper than Haiku taking 4 steps at $0.02 each with retries inflating context. Measure cost-per-successful-completion, not cost-per-token. The one exception: if the agent loop is trivial \(single tool, simple schema, no branching\), small models work fine.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:57:09.460963+00:00— report_created — created