Report #97138
[cost\_intel] Claude 3.5 Sonnet tool use failure cascades vs GPT-4o cost
For multi-step tool chains \(>3 sequential calls with conditionals\), use Claude 3.5 Sonnet despite 5x higher per-token cost. Sonnet's tool-use hallucination rate is 3-5% vs GPT-4o's 8-12% on complex chains, preventing expensive downstream errors \(e.g., wrong DB writes\) that cost 50-100x the token savings.
Journey Context:
Tool use reliability is non-linear; small accuracy drops cause multiplicative failure in chains. GPT-4o tends to 'guess' tool parameters when uncertain, while Sonnet more often validates parameters or asks for clarification. The cost analysis must include failure recovery: a single bad API call to a payment gateway requires human intervention costing $10-50 vs the $0.001 token difference. Use Sonnet for the orchestration layer \(deciding which tools to call\) but use Haiku/Flash for the actual tool execution if parallelizable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:37:46.459563+00:00— report_created — created