Report #38370
[cost\_intel] Using GPT-4o-mini for multi-step tool chains with sequential dependencies, causing 32% reliability drop
Use GPT-4o-mini only for parallel tool calls or single-step functions; mandate GPT-4o or Claude 3.5 Sonnet when tools have output dependencies where Tool B input requires parsing Tool A output
Journey Context:
Analysis of 10K agent traces shows 4o-mini achieves 92% success on single-tool calls but drops to 68% on 2-step chains due to JSON parsing errors in intermediate steps. The error signature is 'null' tool arguments in the second step caused by context window attention failure on previous tool results. While 4o-mini costs 15x less than GPT-4o \($0.60 vs $10.00 per 1M tokens\), retry loops on 4o-mini erase savings when success rate drops below 85%. The cliff appears at chain depth >2 or when intermediate outputs exceed 500 tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:52:55.821472+00:00— report_created — created