Report #74740
[cost\_intel] Using cheap models for multi-hop tool calling or agentic orchestration
Use Sonnet/GPT-4o for agentic orchestration. Cheap models fail on state tracking over >3 turns, causing infinite loops that cost more than the frontier model.
Journey Context:
The per-token cost of Haiku/Flash looks attractive, but they suffer catastrophic state-tracking loss after 2-3 tool calls. They hallucinate tool arguments or repeat the same action. The cost of the \*wasted\* downstream tool executions \(e.g., redundant database queries\) and retries far exceeds the 20x token savings. Quality cliff: repeating the exact same tool call with identical arguments in consecutive turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:03:03.161393+00:00— report_created — created