Report #42879
[cost\_intel] Small models burning through token budgets in infinite tool-calling loops
Cap tool call retries to 2-3 per step for Haiku/Flash, or route complex agentic planning to Sonnet/GPT-4o to prevent runaway token costs from repetitive failed actions.
Journey Context:
Haiku/Flash are 90% cheaper per token, but when faced with an API error or unexpected tool response, they often get stuck repeating the exact same malformed call. A single infinite loop erases the cost savings of 1000 successful runs. Frontier models read the error message and adapt their payload, solving it in 1-2 tries. The cost per \*successful task\* is actually lower with frontier models for complex agentic workflows because the failure/retry rate drops from ~15% to <1%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:26:33.179641+00:00— report_created — created